Adversarial learning studies vulnerability in machine learning, see e.g. (Vorobeychik & Kantarcioglu, 2018; Joseph et al., 2018; Biggio & Roli, 2017; Lowd & Meek, 2005). Understanding optimal attacks that might be carried out by an adversary is important, as it prepares us to manage the damage and helps us develop defenses. Time series forecast, specifically autoregressive model, is widely deployed in practice (Hamilton, 1994; Box et al., 2015; Fan & Yao, 2008) but has not received the attention it deserves from adversarial learning researchers. Adversarial attack in this context means an adversary can subtly perturb a dynamical system at the current time, hence influencing the forecasts about a future time. Prior work (Alfeld et al., 2016, 2017) did point out vulnerabilities in autoregressive models under very specific attack assumptions. However, it was not clear how to formulate general attacks against autoregressive models.
This paper makes three main contributions: (1) We propose a general attack framework against autoregressive models that subsumes (Alfeld et al., 2016, 2017). (2) We formulate optimal (i.e. most subtle and severe) attack as an optimal control problem. (3) We solve the optimal attacks with control techniques, specifically Linear Quadratic Regulator (LQR), Model Predictive Control (MPC), and iterative LQR (iLQR).
2 A General Adversarial Attack Framework Against Autoregressive Models
To fix notation, we briefly review time series forecasting using autoregressive models. There are two separate entities:
1. The environment is a dynamical system with scalar-valued states at time . The environment has a -th order dynamics and is potentially subject to zero-mean noise with . Without the interference of adversary, the environmental state evolves as
for . We take the convention that if . We allow the dynamics to be either linear or nonlinear; for linear dynamics we will show that the optimal adversarial attacks have a closed-form solution, while for nonlinear dynamics we will show how to approximate the optimal attack using iterative Linear Quadratic Regulator.
2. The forecaster makes predictions of future environmental states, and will be the victim of the adversary attack. In this paper we consider a linear autoregressive model, regardless of whether the environment dynamics is linear or not. We also allow the possibility . At time , the forecaster observes and uses the most recent observations to forecast the future values of the environmental state. Since the quality of forecast may depend on when it is made, we use the notation to denote the forecast made at time about a future time . Specifically, the forecaster uses a standard model to predict the state at time by
where are coefficients of the model. The model may differ from the true environment dynamics even when is linear: for example, the forecaster may have only obtained an approximate model from a previous learning phase. Once the forecaster predicts , it can plug the predictive value in (2), shift time by one, and predict , and so on. Note all these predictions are made at time . In the next iteration when the true environment state evolves to and is observed by the forecaster, the forecaster will make predictions , and so on.
We next introduce a third entity – an adversary – who wishes to control the forecaster’s predictions for nefarious purposes. The threat model is characterized by three aspects of the adversary:
() knowledge: The adversary knows everything above. To clarify, however, at time the adversary must decide on an attack action before seeing the noise .
() goal: The adversary wants to force the forecaster’s predictions to be close to some given adversarial reference target , for selected pairs of of interest to the adversary. Furthermore, the adversary wants to achieve this with “small attacks”. These will be made precise below.
() action: The adversary can perform a state attack on the underlying environmental states, see Figure 1. Let be the attack (the “control input”) at time , which is applied to the environment dynamics via:
3 Attacks Formulated as Optimal Control
We now present an optimal control formulation for the state attack. We define the adversary’s state attack problem by the tuple
is the environment dynamics (zero-mean noise is considered part of the dynamics),
is the initial state111It is straightforward to generalize to an initial distribution over , which adds an expectation in (8).,
is the forecaster’s model
222 The forecaster usually only has an estimate of the environment dynamics
The forecaster usually only has an estimate of the environment dynamics. It is likely that the forecaster’s model has a different order than the environment dynamics’ order . This can be handled in a straightforward manner, but the notation becomes cumbersome. Thus we will assume below, but explain how to handle in Appendix A. , are adversarial reference targets, are adversarial target weight matrices, and is the adversarial effort parameter. These quantities will be defined below.
Following control theory convention we introduce a vector-valued environment state representation (denoted by boldface)The first entry serves as an offset for the constant term in (2). We rewrite the environment dynamics under adversarial control (3) as . If is nonlinear, so is .
We introduce a separate forecaster state (denoted by hat): We let when . Because the forecaster employs an model, we have a linear forecast dynamics:
where we introduced the forecast matrix
We also vectorize adversarial reference target (denoted by bar): We simply let when : this is non-essential. In fact, for pairs that are uninteresting to the adversary, the target value can be undefined as they do not appear in the control cost later.
Since the environment dynamics can be stochastic, the adversary must seek attack policies to map the observed state to an attack action:
The cost of the adversary consists of two parts: (1) how closely the adversary can force the forecaster’s predictions to match the adversarial reference targets; (2) how much control the adversary has to exert. These are represented by the two terms in the adversary’s quadratic cost function with an adversarial effort parameter chosen by the adversary:
where we define A word is in order about the unusual summation range of (from 1 to ) in (7). We use to denote the prediction time horizon: is the last time index (expressed by ) to be predicted by the forecaster. Due to the fact that the forecaster always predicts the future, namely in (see the text below (2)), the last value for is therefore . On the other end, although it is perfectly fine to talk about the forecasts made at the very beginning about a future time , such zero-time predictions are not controllable by the adversary: the adversary can’t change and thus cannot change the forecaster’s predictions made on time . In other words, are constants with respect to the control policies . As such, we remove them from the adversary’s objective function (7). This explains why in the first cost term starts from 1. We emphasize that the above discussion only concerns the cost function; the adversary seeks control policies for .
The matrices define adversarial targets. is a matrix with all zero entries except for a scalar weight at (2,2). simply picks out the element: , but we write it this way to conform to control theory convention. Critically, the adversary can use the weights to express different attack targets. For example:
If for a specific pair and 0 for all other pairs, the adversary is only interested in forcing the forecaster’s prediction made at time about future time to be close to the adversarial target . What happens to other predictions is irrelevant.
If for all and 0 otherwise, the adversary cares about the forecasts made at all times about the final time horizon . In this case, it is plausible that the adversarial target is a constant w.r.t. .
If for all and 0 otherwise, the adversary cares about the forecasts about “tomorrow.”
if , the adversary cares about all predictions made at all times.
Obviously, the adversary can express more complex temporal attack patterns. The adversary can also choose value in between and to indicate weaker importance of certain predictions. Our formulation generalizes the work in (Alfeld et al., 2016).
Given an adversarial state attack problem , we formulate it into the following optimal control problem:
where expectation is over in the objective function. We next propose solutions to this control problem for linear and nonlinear , respectively.
3.1 Solving Attacks Under Linear
When the environment dynamics is linear, the scalar environment state evolves as
where the coefficients in general can be different from the forecaster’s model (2). We introduce the corresponding vector operation
where has the same structure as in (5) except each is replaced by , and . The adversary’s attack problem (8) reduces to stochastic Linear Quadratic Regulator (LQR) with tracking, which is a fundamental problem in control theory (Kwakernaak & Sivan, 1972). It is well known that such problems have a closed-form solution, though the specific solution for stochastic tracking is often omitted from the literature. In addition, the presence of a forecaster in our case alters the form of the solution. Therefore, for completeness we provide the solution below.
Under the optimal control policy for problem (8) is given by:
The matrices and vectors are computed recursively by a generalization of the Riccati equation:
The proof is in Appendix B. Once the adversarial control policies are computed, the optimal attack sequence is given by: The astute reader will notice that by (11-13), . This is to be expected: affects , but would only affect forecasts after the prediction time horizon , which the adversary does not care. To minimize the control effort, the adversary’s rational behavior is to set .
3.2 Solving Attacks Under Non-Linear
When is nonlinear the optimal control problem (8) does not have a closed-form solution. Instead, we introduce an algorithm that combines Model Predictive Control (MPC) (Garcia et al., 1989; Kouvaritakis & Cannon, 2015) as the outer loop and Iterative Linear Quadratic Regulator (iLQR) (Li & Todorov, 2004) as the inner loop to find an approximately optimal attack. While these techniques are standard in the control community, to our knowledge our algorithm is a novel application of the techniques to adversarial learning.
The outer loop performs MPC, a common heuristic in nonlinear control. At each time, MPC performs planning by starting at , looking ahead steps and finding a good control sequence . However, MPC then carries out only the first control action . This action, together with the actual noise instantiation , drives the environment state to . Then, MPC performs the -step planning again but starting at , and again carries out the first control action . This process repeats. Formally, MPC iterates two steps:
where and the expectation is over . Denote the solution by .
2. Apply to the system.
The repeated re-planning allows MPC to adjust to new inputs, and provides some leeway if cannot be exactly solved, which is the case for our nonlinear .
We now turn to the inner loop to approximately solve (16). There are two issues that make the problem hard: the expectation over noises , and the nonlinear . To address the first issue, we adopt an approximation technique known as “nominal cost” in (Kouvaritakis & Cannon, 2015)
. For planning we simply replace the random variableswith their mean, which is zero in our case. This heuristic removes the expectation, and we are left with the following deterministic system as an approximation to (16):
To address the second issue, we adopt iLQR in order to solve (17). The idea of iLQR is to linearize the system around a trajectory, and compute an improvement to the control sequence using LQR iteratively. Concretely, given we initialize the control sequence in some heuristic manner: . If there is not good initialization for the control sequence, just set it to . We then simulate the system in (18) using to obtain a state sequence :
While may not be sensible themselves, they allow us to perform a first order Taylor expansion of the nonlinear dynamics around them:
where and are and Jacobian matrices:
Note by definition. Rearranging and introducing new variables , , we have the relation
The solutions are then applied as an improvement to . We now have an updated (hopefully better) heuristic control sequence . We take and iterate the inner loop starting from (19) for further improvement.
Critically, what enables iLQR is the fact that (3.2) is now an LQR problem (i.e., linear) with a closed-form solution:
In the following, denote . The optimal solution of (3.2) is given by: for
where . The sequences are computed by:
The proof is given in Appendix C.
3.3 A Greedy Control Policy as the Baseline State Attack Strategy
The optimal state attack objective (8) can be rewritten as a running sum of instantaneous costs. At time the instantaneous cost involves the adversary’s attack effort , the attack’s immediate effect on the environment state (see Figure 1), and consequently on all the forecaster’s predictions made at time about time . Specifically, the expected instantaneous cost at time is defined as:
This allows us to define a greedy control policy , which is easy to compute and will serve as a baseline for state attacks. In particular, the greedy control policy at time minimizes the instantaneous cost:
When is linear, can be obtained in closed-form.
The greedy control policy at time is
The proof is given in Appendix D. When , is and .
When is nonlinear, we let noise and solve the following nonlinear problem using numerical solvers:
We now demonstrate the effectiveness of adversarial attacks computed by control theory methods on both synthetic and real world time series forecast problems. We compare the optimal attacks computed by LQR (for linear ) and MPC+iLQR (for nonlinear ) against greedy control attacks in section 3.3, and also the no-attack baseline. While the attack actions were optimized under an expectation over random noise (c.f. (8)), in the experiments we report the actual realized cost, which is practically more relevant. Specifically, the actual realized cost is computed based on the actual noise instantiation that the algorithm experienced, and is defined as:
where the noise sequence is incorporated implicitly in , together with the actual attack sequence .
4.1 Effect of Attack Target Selection by
In our first synthetic example we demonstrate the adversary’s ability to attack different parts of the forecasts via , the quadratic coefficient in cost function. Figure 2 illustrates three choices of attack targets :
1. “Tomorrow”: The attacker only cares about the forecasts made on about the immediate next time step. That is, at the attacker wants to force the forecast of time step 2, namely , to be close to the adversarial reference target ; at force the forecast of time step 3 to be close to , and so on. This is done by setting in for , and let all other to be zero.
2. “Last day”: the attacker only cares about predictions made on .
3. “All predictions”: the attacker wants to match any prediction to its reference target for and .
For simplicity, we let all adversarial reference targets be constant 1. We let the environment evolve according to an model: . Equivalently, in matrix notation. We let the noise and the initial state . We simulate the case where the forecaster has only an approximation of the environment dynamics, and let the forecaster’s model be which is close to, but different from, the environment dynamics. The equivalent matrix . For illustrative purpose, we set the prediction time horizon . Recall that the attacker can change the environment state by adding perturbation : . To make the balance between attack effect (the quadratic terms involving ) and attack effort (the terms) more interpretable, we let and set .
We run LQR and obtain optimal attack sequences under each scenario. They are visualized in Figure 2. We then compute the actual realized cost with these different sequences as a sanity check in Table 1. As expected, the LQR optimized for that performs better than other ’s.
In conclusion, different target selection will affect the optimal attack sequence.
4.2 Effect of Attack Effort Weight
Using the same synthetic task and attack scenario ”tomorrow”, we show that the attack effort dramatically affects the optimal attack policy. Previously . Now we show LQR with more extreme values and in Figure 3.
When is small the attacker can afford to drive the predictions very close to the attack target by using large actions . In contrast, a large prohibits the adversary from doing attacks at all, so the outcome is similar to no-attack.
4.3 Comparing LQR vs. Greedy Attack Policies
We now show the LQR attack policy is better than the greedy attack policy. The environment evolves by an model: . Equivalently, in matrix notation. The initial values are , prediction horizon . The forecaster’s model is . Equivalently, in matrix notation. is about ”tomorrows”. This environment dynamic is oscillating around . The attacker wants the forecaster to predict a sequence oscillating with smaller amplitude. The sequence is set as following: let , then, let the attack reference target be . We set . LQR and Greedy are used to solve this attacking problem. We generate trials with different noise sequences and run both LQR and Greedy for each noise sequence, see Figure 4.
Interestingly, LQR drives the predictions close to the attack target, while Greedy diverges. The mean actual realized cost of no attack, LQR attack and Greedy attack are
respectively. The standard errors are. We perform a paired
-test on LQR vs. Greedy. The null hypothesis of equal mean is rejected with. This clearly demonstrate the myopic failure of the greedy policy.
4.4 MPC+iLQR Attack on Nonlinear Dynamic
We now show MPC+iLQR can attack nonlinear environment dynamics. We use the same dynamic in (Fan & Yao, 2008) but we change the noise to be . The dynamic is