Fast Differentiable Clipping-Aware Normalization and Rescaling

07/15/2020 ∙ by Jonas Rauber, et al. ∙ 6

Rescaling a vector δ⃗∈ℝ^n to a desired length is a common operation in many areas such as data science and machine learning. When the rescaled perturbation ηδ⃗ is added to a starting point x⃗∈ D (where D is the data domain, e.g. D = [0, 1]^n), the resulting vector v⃗ = x⃗ + ηδ⃗ will in general not be in D. To enforce that the perturbed vector v is in D, the values of v⃗ can be clipped to D. This subsequent element-wise clipping to the data domain does however reduce the effective perturbation size and thus interferes with the rescaling of δ⃗. The optimal rescaling η to obtain a perturbation with the desired norm after the clipping can be iteratively approximated using a binary search. However, such an iterative approach is slow and non-differentiable. Here we show that the optimal rescaling can be found analytically using a fast and differentiable algorithm. Our algorithm works for any p-norm and can be used to train neural networks on inputs with normalized perturbations. We provide native implementations for PyTorch, TensorFlow, JAX, and NumPy based on EagerPy.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

Code Repositories

clipping-aware-rescaling

Calculates eta such that norm(clip(x + eta * delta, a, b) - x) == eps.


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Images, audio recordings, measurement sequences, and other data can be represented as vectors living in a space . Images, for example, are often represented as vectors in or , where is the total number of pixels. In data science, machine learning, vision science and other disciplines it is common that we perturb these vectors, that is we add other vectors to them. For example, in machine learning it is common to add random noise to the input data as a form of data augmentation that regularizes the model and leads to better generalization. In vision science, we add perturbations of a controlled size to images and then measure how well humans can still perceive the content of these images. Mathematically speaking, we start with a perturbation vector . We then rescale it using a non-negative scalar to the desired norm , that is we choose such that . This can be trivially solved as

with . Finally, we add our rescaled perturbation to our data point and obtain our perturbed data point .

The problem occurs when the perturbed data point no longer lies within the data domain . Whether this happens depends on the size of the perturbation , the location of the starting point within the data domain (e.g. whether it is close to the boundary of the domain) and of course the data domain itself (an unbounded domain such as will never be violated). For a bounded domain and non-zero perturbation , there always exists a scale such that .

The most common solution for this problem is simply clipping the perturbed data point to the data domain. Mathematically, the element-wise clipping to a bounded data domain can be written as

for all . Unfortunately, whenever the clipping actually changes a value, it reduces the norm of the effective perturbation and makes it smaller than the original perturbation .

If we are interested in controlling the effective perturbation size after clipping (e.g. in vision science) or fully utilizing our perturbation budget (e.g. in adversarial robustness research), we thus need to increase the scale of the perturbation to counterbalance the clipping. Increasing does however also increase the amount of clipping, thus leading to an iterative process. While this iterative process can be solved using a binary search, this would be slow and non-differentiable.

In this tech report, we show that the interference between clipping and rescaling can be resolved analytically using a fast and differentiable algorithm that directly finds the optimal rescaling . Our algorithm works for any p-norm and can be used to train neural networks on inputs with normalized perturbations. We provide native implementations for PyTorch (pytorch), TensorFlow (tensorfloweager), JAX (jax), and NumPy (numpy) based on EagerPy (rauber2020eagerpy).

2 Problem

In section 1, we described how our problem is caused by the interference between (a) rescaling the perturbation to the desired norm and (b) clipping the perturbed data point to the data domain. Both operations influence the effective perturbation size and more upscaling of the perturbation also causes more clipping and thus a reduction of the effective perturbation size. Here we formalize our problem as a mathematical equation that we then solve analytically in section 3: Find such that

(1)

with known , , , , . Without the clipping , this could be trivially solved as

(2)

3 Solution

In this section, we show how to solve Equation 1 for despite the clipping (see Equation 3). The main insight is that we can write the p-th power of the left side of Equation 1

as a piecewise linear function of .

(3)

This piecewise linear representation can be efficiently computed and inverted to solve Equation 1 for and ultimately for . The exact algorithm to do this for is shown in section 4.

4 Algorithm and Implementation

A basic NumPy implementation of the algorithm to solve Equation 1 is given in 1

. A fully working open-source BSD-licensed implementation of the algorithm with batch support is available on GitHub

111https://github.com/jonasrauber/clipping-aware-rescaling. It is based on EagerPy (rauber2020eagerpy) and works natively with PyTorch, TensorFlow, JAX, and NumPy. The algorithm is only shown for , but it generalizes directly to other p-norms by replacing square and sqrt with the corresponding functions.

1def clipping_aware_rescaling(x, delta, eps):
2    """Calculatesetasuchthat
3␣␣␣␣norm(clip(x+eta*delta,0,1)-x)==eps.
4
5␣␣␣␣Args:
6␣␣␣␣␣␣␣␣x:A1-dimensionalNumPyarray.
7␣␣␣␣␣␣␣␣delta:A1-dimensionalNumPyarray.
8␣␣␣␣␣␣␣␣eps:Anon-negativefloat.
9
10␣␣␣␣Returns:
11␣␣␣␣␣␣␣␣eta:Anon-negativefloat.
12␣␣␣␣"""
13    delta2 = np.square(delta)
14    space = np.where(delta >= 0, 1 - x, x)
15    f2 = np.square(space) / delta2
16    ks = np.argsort(f2)
17    f2_sorted = f2[ks]
18    m = np.cumsum(delta2[ks[::-1]])[::-1]
19    dx = np.ediff1d(f2_sorted, to_begin=f2_sorted[0])
20    dy = m * dx
21    y = np.cumsum(dy)
22    j = np.flatnonzero(y >= eps**2)[0]
23    eta2 = f2_sorted[j] - (y[j] - eps**2) / m[j]
24    eta = np.sqrt(eta2).item()
25    return eta
Listing 1: NumPy code solving Equation 1 for , ,

5 Applications

In this section, we describe two applications of this algorithm for adversarial robustness research, but emphasize that this algorithm is in no way restricted to adversarial perturbations or to images.

5.1 Adversarial Noise Attacks

Testing the robustness of deep neural networks or other machine learning models against simple noise (e.g. Gaussian noise, uniform noise, etc.) can be phrased as a naive adversarial attack. It first draws a random perturbation from the noise distribution (independent of the sample that will be perturbed). It then normalizes and rescales the perturbation to the desired size (e.g. p-norm) and adds it to the sample. In general, such a random noise perturbation will however change some input values such that they are outside of domain of valid samples (e.g. a pixel value that is no longer between 0 and 255). Therefore, the perturbed samples need to be clipped to the valid space before they are passed through the neural network, that is values larger (smaller) than the upper (lower) bound need to be replaced with the upper (lower) bound. Unfortunately, in general such a clipping reduces the effective perturbation size and thus the already naive adversarial attack does not even fully utilize its perturbation budget. When such adversarial noise attacks were originally introduced by rauber2017foolbox, this problem was solved iteratively and approximately using a binary search over the scale of the perturbation. Using the algorithm presented in this tech report, the new adversarial noise attack implementations in rauber2020foolboxnative directly scale the perturbation to achieve the desired perturbation size after clipping, thus improving both attack effectiveness (exact solution) and performance (non-iterative algorithm).

5.2 Learning Adversarial Noise

In rusak2020increasing

, the distribution of the adversarial noise is learned rather than fixed to obtain a worst-case noise distribution. To make the noise maximally effective, it needs to fully exploit its perturbation budget. Using the above algorithm, this is possible while still being able to backpropagate through the rescaling and clipping.

Acknowledgements

J.R. acknowledges support from the Bosch Research Foundation (Stifterverband, T113/30057/17) and the International Max Planck Research School for Intelligent Systems (IMPRS-IS).

References