Ordinary differential equations relating the applied input forces to the system state variables are commonly employed in modeling structural dynamic systems. Direct measurements of the system state variables are not always possible; instead, noisy observations as a function of the system state variables are obtained at discrete time instants. State estimation addresses the problem of obtaining optimal estimates of the unobserved system states from noisy measurements given the knowledge of the system model and the dynamic input forces. This problem is at the core of many engineering applications ranging from condition monitoring and prediction of stresses, to feedback control and performance evaluation. The Kalman filter and its variants have been widely used for state estimation of a broad class of both linear and nonlinear dynamical systems [1, 2, 3]
. A significant challenge in dynamical state estimation arises due to inadequate knowledge of the inputs acting on the system. In many civil engineering systems such as buildings or bridges, measuring the operational external forces (wind loads, vehicular loads, etc.) is not practical. A commonly adopted approach in such cases of unmeasured input forces is to assume that the input force is a zero mean white noise process and then apply filtering techniques for state estimation; however, in many cases the actual inputs may violate this white noise assumption and could potentially yield poor estimation results. Thus, there is a need to inversely determine the unmeasured input forces. Input estimation, also known as force identification, involves reconstructing the latent or unmeasured dynamic input forces given a limited number of sensor measurements and knowledge of the system model —obtained either from a finite element model or inferred a priori using system identification methods. Cases where input estimation and state estimation are performed together are referred to as joint input-state estimation.
Originally, force identification problems were treated separately from state estimation problems, and the earliest algorithms used frequency response functions to inversely compute the unknown forces [4, 5]. Subsequently, a wide variety of force identification methods were proposed, both in time-domain and frequency-domain (see  for a recent survey on force reconstruction techniques). Among time-domain approaches, two distinct modelling frameworks exist: deterministic [7, 8] and stochastic [9, 10, 11]
. The stochastic time-domain approaches differ from their deterministic counterparts in the manner that uncertainty in the form of noise is incorporated in the system and measurement model, and the state and/or input estimates so obtained under stochastic framework are characterized by probability distributions.
Recently, there has been a considerable amount of interest in developing and applying stochastic joint input-state estimation algorithms. Gillijns and De Moor 
proposed a minimum-variance unbiased filter for joint input and state estimation of linear systems without a direct transmission term, where the input estimation and state estimation are performed in two independent, sequential steps. This algorithm was extended to include the direct transmission term which was later applied to structural dynamics by Lourens et al.  for use with reduced-order models (i.e., models obtained using a relatively small number of modes). Lourens et al. 
also proposed an augmented Kalman filter (AKF) that appends the unknown forces to the state vector to jointly estimate the dynamic forces and system states using a classical Kalman filter. This method results in spurious low frequency components in the force and displacement estimates when only acceleration measurements are used. An analytical investigation performed in showed that the augmented state-space formulation of AKF is susceptible to numerical instability and un-observability issues when only acceleration measurements are used. In an effort to overcome these shortcomings, Naets et al.  proposed the use of artificial displacement measurements to stabilize systems with only acceleration measurements. This algorithm, known as AKF with dummy measurements (AKFdm), is similar to the one proposed by Chatzi and Fuggini  for alleviating the low frequency drifts in displacement estimates needed for structural monitoring purposes. Azam et al.  proposed a dual Kalman filter (DKF) for state and input estimation using only acceleration measurements. Unlike the AKF formulation where the input and states are jointly predicted and updated, the DKF follows a successive structure of implementation, in that, the input prediction and update are followed by the state prediction and update. Due to the separation of the estimation process in two successive steps, the DKF typically overcomes the issue of un-observability. Maes et al.  proposed a modified AKF which directly accounts for the correlation between the process noise and the measurement noise.
An attractive feature of the above Kalman filtering-based joint input-state estimation methods lies in their amenability for online implementation. That said, their accuracy in force reconstruction critically depends on prior tuning of the covariance matrices associated with the unknown inputs. For example, in the AKF and DKF algorithms, the covariance matrix for the unknown input is tuned while in the AKFdm, an additional covariance matrix for the dummy displacements must be tuned in addition to the input covariance. In the absence of expert knowledge of the covariance matrices, a trial-and-error approach or a heuristic rule of thumb is often adopted to tune the input covariance matrices prior to implementation. This may not be possible when there is very little prior knowledge about the input and/or the states of the system. The tacit approach followed in most cases is to have an offlinetuning phase —where some response data collected over a certain time duration is used to obtain optimal values of the covariance matrices —prior to the online estimation phase with Kalman filtering.
In 2009, Alvarez et al.  proposed a Bayesian learning technique termed latent force model (LFM), that allows flexible modelling of unknown inputs (or latent forces) in a system. The unknown or latent forces of a system are modelled using non-parametric Gaussian process (GP) model  with some chosen covariance functions. The choice of covariance function provides flexibility to incorporate generic prior knowledge about the behavior of the inputs (e.g. random, periodic, smoothly varying). A Gaussian process based LFM, GPLFM in short, is then constructed incorporating the physics-based mathematical model of the system (e.g. differential equations of the system) within the GP covariance functions to infer inputs to the system. The behavior of the latent forces depend on the parameters (also called hyperparameters) of the chosen GP covariance functions which constitute the tunable parameters in this model. These hyperparameters can be set based on expert knowledge, or they can be tuned by optimization using observed data. Hartikainen and Särkkä  and more recently Särkkä et al.  advocated a state-space approach to GPLFM inference wherein all information about the system and the GP force model is fully captured in an augmented state-space representation, thus making the inference amenable to use with Bayesian filtering and smoothing methods. Once the hyperparameters of GPLFM are computed, a GPLFM can be adapted for online implementation with Kalman filter.
This paper introduces GPLFM for joint input and state estimation of linear time-invariant structural systems. To the authors’ knowledge, this is the first application of GPLFM in the domain of structural dynamics. Additionally, the following are the key contributions of this paper:
Unlike the AKF, the GPLFM formulation is analytically proved to be observable and is shown to be robust against drift in force estimation even when used with only acceleration measurements.
A maximum likelihood optimization based on the observed data is used to compute the optimal hyperparameters of the GPLFM, which eliminates the need for tuning the parameters heuristically or through trial-and-error.
An in-depth numerical study is also presented comparing the performance of the proposed GPLFM algorithm with the existing AKF, AKFdm, and DKF algorithms.
The paper is organized as follows. Section 2 presents the mathematical model of the structure and outlines the objective of this study. Next, a concise background on GPLFM and its use in regression is provided in Section 3. In Section 4, the GPLFM is formulated into a linear state-space model, and a procedure for joint posterior inference on the inputs and states is discussed. Section 4.5 presents GPLFM as a generalization of the augmented state-space models used in joint input-state estimation and provides proofs of observability and stability of the GPLFM. Section 6 encompasses numerical studies using a 10-dof shear building, illustrating the performance of GPLFM under different excitation and measurement scenarios. The paper is finally concluded with an application of the proposed method on a 76-storey ASCE benchmark office tower for joint input and state estimation.
2 Problem Formulation
2.1 Mathematical model of structural system
The equation of motion of a degrees-of-freedom (dofs) linear structural system under forced excitation including ground motions can be represented by the following second order ordinary differential equation (obtained after discretization in space):
where, is the vector of displacements at the degrees of freedom and , and represent the mass, damping, and stiffness matrices of the structural system, respectively. The external forces acting on the structure are represented by a combination of the external loads acting on the structure and the forces generated due to earthquake ground motion . The first term denotes the product of a load influence matrix and the vector representing the external load time histories while the second term represents the forces generated in the structure due to earthquake ground motion where is the matrix of influence coefficients which specifies the dofs affected by the ground motion. In cases where the size of the structural model is very large, it is common to use modal reduced-order models.
Defining a state vector and a force vector as
where and ; and , the set of second-order differential equations in Equation 1 can be converted to a set of first-order differential equations called as the continuous-time state-space equation
where the continuous-time system matrices and are defined as
is the identity matrix of appropriate dimension. Consider next the measurement equation, where it is assumed that a combination of displacements, velocities and accelerations can be measured. Hence the output vector, containing measured quantities, assumes the following form
where, , and are the selection matrices for displacements, velocities and accelerations, respectively. Using Equation 1 and the defined state and force vectors, Equation 5 can be written in the state-space form as
where, the output influence matrix and direct transmission matrix are
Note that the second column of the direct transmission term corresponds to the earthquake ground motion input and takes zero values as the direct contribution of earthquake ground motion vanishes when measuring absolute accelerations (see derivation in D)
Equations 3 and 6 form the continuous-time state-space model (SSM) for the system described by Equation 1. In practical applications, continuous time outputs are of course not observed, instead noisy measurements are obtained via sampling of system responses at discrete time instants. Assuming a sampling interval of , discrete time instances are defined at for , and a discrete measurement model can be expressed as
where is the measurement noise vector. Combined together, Equations 3 and 8 represent a SSM with continuous-time dynamics and discrete-time measurements known as the continuous-discrete state-space model (see , pg170 or , pg93)
where , , , . The terms and are the process and the measurement noise vectors, typically added to to account for modelling errors and measurement errors, respectively. They are assumed zero-mean Gaussian white noise with covariances
with and .
Having elaborated on the mathematical model of structure of interest, attention is now focused on the problem of stochastic joint input and state estimation. Here, both the inputs and the system states
are taken to be unknown sequence of Gaussian random variables, and the goal lies in inferring the joint posterior distribution of the inputs and states from a set of noisy measurementsgiven the a priori knowledge of the structural model parameters. In pursuit of this goal, the objective of this paper is to implement a Gaussian process latent force model which improves upon existing methods by reducing the dependency on manual tuning of covariance matrices associated with the unknown inputs and also provides numerical stability with respect to observability of the augmented state-space formulation when used with only acceleration measurements. The location of the inputs is assumed to be known, however studies on input localization can be found in [26, 27, 28].
3 Background on Gaussian process latent force models
3.1 Brief overview of Gaussian processes
Gaussian processes (GPs) [29, 21, 30] are a class of stochastic processes that provide a paradigm for specifying probability distributions over functions. Gaussian processes have been widely studied and used in many different fields such as signal processing , geostatistics [32, 33] and inverse problems . For instance, in geostatistics literature Gaussian process regression is known as Kriging. A Gaussian process is a generalization of the Gaussian probability distribution. While a probability distribution is defined over random variables which are scalars or vectors (for multivariate distributions), a stochastic process governs the properties of functions. Consider an independent variable and a function such that . Then a GP defined over with the mean and covariance function is as follows
where denotes the hyperparameters of the covariance function . The choice of the covariance function allows encoding any prior knowledge about (e.g., periodicity, linearity, smoothness), and can accommodate approximation of arbitrarily complex functions . The notation in Equation 12
implies that any finite collection of function values has a joint multivariate Gaussian distribution, that is, where is the mean vector and is the covariance matrix with for . If no prior information is available about the mean function, it is generally set to zero, i.e. . However, for the covariance function, any function that generates a positive, semi-definite, covariance matrix is a valid covariance function. For example, the following squared exponential covariance function has the form
where are the hyperparameters of the covariance function. The mean and covariance functions dictate how the random functions behave on average and how the different points in the input space vary with respect to each other. The covariance function thus encodes a correlation structure which introduces inter-dependencies between function values at different inputs.
3.2 Gaussian process regression
GPs are often used to solve regression problems [34, 35, 36]. In general, regression problems are concerned with the estimation of values of a dependent variable observed at certain values of an independent variable , given a set of noisy measurements . The relation can be written as
where is noise. In other words, regression involves estimating the latent (unobserved) function that will enable prediction of at new values of .
One perspective of looking at GPs representing the input-output relation is called the function-space view, given in Rasmussen and Williams . Unlike traditional modelling approaches which rely on fitting a parameterized mathematical form to approximate the input-output relation , a GP does not assume any explicit form of , rather a prior belief (in the form of the mean function and the covariance function) is placed on the space of all possible functions
. GPs belongs to the class of ‘non-parametric’ models because the number of parameters in the model is not fixed, but rather determined by the number of data points. Upon data collection, the posterior distribution overis updated according to Bayes’ rule. All the candidate functions represented by GPs can be used to express linear or nonlinear relationships between the input and the output .
It is to be remarked at this point that the input dimension (dimension of the independent variable ) is usually greater than 1, however in this work, the primary focus is on doing inference using time-series data where time is the only independent input variable. Thus , and is hereafter replaced by — referred to as the time domain. Examples where input dimension is greater that one () are situations where the input variables may represent time and one-dimensional space (), or space in two dimensions (), or even time with two-dimensional space ().
Now consider the prediction of the value of at a test point , based on a previously measured set of outputs which are corrupted with white Gaussian noise:
Assume that is modelled as a Gaussian process with zero mean and a covariance function, that is
where is the covariance function with hyperparameters
. Under the Gaussian process assumption, the joint distribution betweenand is Gaussian, and can be expressed as
where is the vector consisting of time points, is a vector of measured outputs, is the covariance matrix comprising elements , and is a vector with elements .
Using the properties of multivariate Gaussian distributions, it can be shown that the posterior distribution of given the dataset and hyperparameters is also Gaussian
with mean and variance
3.3 Latent force models using Gaussian processes
Latent force models [20, 37] are a hybrid approach of modelling which combines mechanistic models (i.e. models based on physical laws) with non-parametric components such as GPs. Consider the following linear continuous-time state-space equations obtained from Equations 3 and 8
with all components of the unknown input vector being modelled as zero-mean independent Gaussian processes
Here is the th component of , . The assumption of zero mean leads to no loss of generality. Note that the hyperparameters of the covariance function have been omitted for notational compactness. The model, given by Equations 21 and 22, leads to a (linear) latent force model, in which the basic idea is to combine the mechanistic state-space model with unknown (latent) forcing functions modelled by non-parametric Gaussian processes.
The solution of the state in the differential equation in Equation 21 can be obtained as
where denotes the matrix exponential and is the initial condition of state at time . The state covariance matrix function can then be calculated as
where is the prior covariance matrix of and is the joint covariance matrix of all latent forcing functions between time instants and
is a diagonal matrix due the assumption of independence across all force components. Observe that the states form a multi-dimensional GP
and thus both and are GPs. Next the measurement equation is rewritten as
Since the linear combination of Gaussian processes is a Gaussian process, is a Gaussian process, and can be expressed as
and and are defined by Equations 25 and 24 respectively. A derivation for can be found in C. Thus, one can replace the problem defined by Equations 21, and 22 with an equivalent Gaussian process regression problem
This approach, introduced by Alvarez et al. [20, 37] reduces the latent force model in Equation 21 and 22 to a GP regression problem as expressed by Equation 31. However, the approach requires analytical computation of the covariance matrix functions , which may not always be obtained in a closed form and may need computation via numerical integration. In the next section, a formulation to convert covariance functions to state-space form for a certain class of GP covariance functions is considered. In that formulation, numerical integration is avoided altogether and replaced with matrix exponential computations and solutions to Lyapunov equations.
4 GPLFM as stochastic SSMs for sequential inference
This section discusses how temporal GPs with a certain class of covariance functions can be reformulated as SSMs and demonstrates how they can be combined with linear LFMs for sequential posterior inference of latent inputs and states. It must be emphasized at this point that only stationary covariance functions which have the form , where , are considered in this work.
4.1 State space representation of temporal GPs
Consider a scalar-valued GP with stationary covariance function as follows
where are known constants and is a white noise process with spectral density . Equation 33 can be written in state-space form
where the state and the matrices , and are given by:
The discrete-time form of the LTI model in Equation 34 is given by
where , and . This formulation allows GP regression problems to be solved sequentially with Kalman filtering and smoothing techniques. The iteration can be started with initial state . Note that the model (and the corresponding covariance function) is stationary, and therefore has a steady state solution distributed as , where the steady state covariance matrix can be obtained as the solution of continuous-time Riccati solution
Since steady state is invariant with respect to time, the initial covariance matrix is set equal to the steady state covariance, .
Using the SSM given by Equation 34, the power spectral density of can be computed as
The stationary covariance function for
is related to its spectral density through the inverse Fourier transform
In terms of the state-space matrices, can be calculated using 
Hartikainen and Särkkä  showed that it is possible to form , and such that has the desired covariance function only if the spectral density of has a proper rational form
By applying spectral factorization, the spectral density can be expressed similar to Equation 38
where is a stable rational transfer function – has poles in the left half plane. The procedure to convert a covariance function into SSM is outlined as follows :
The spectral density is computed by taking Fourier transform of
Finally, the transfer function model is converted into an equivalent SSM driven by a scalar-valued white noise process with spectral density
Many GPs having stationary covariance functions can be expressed into state-space form as in Equation 36 with model matrices defined by , , , and . It should be noted that non-stationary covariance functions can also be converted into state-space forms , however the Fourier transform approach is no longer applicable.
4.2 GP-LFM in joint state-space form
where each component, , of the latent force vector is modelled as a GP. For each component, given a stationary covariance function with hyperparameters , a state-space representation can be constructed using the above procedure. The state-space representation for the th component of , is given by
where is the state vector for th latent force, is a zero mean Gaussian process with spectral density , and , , are the state-space matrices with being the dimension of the state vector , . Formally, the linear time-invariant system represented in Equation 44 is a stochastic differential equation where represents a Brownian motion, and the mathematical description is given via stochastic integrals (see e.g.  for an account of this). The state-space form of the LFM can be obtained by combining the component GP SSMs with the system SSM to yield the following augmented model:
where are the columns of matrix and are the columns of matrix and vector , is a vector-valued Gaussian process having spectral density , . Equation 45 can be written using block matrices as follows:
In shorthand notation, the augmented SSM as shown in Equation 46 can be represented by
Here is the augmented state vector, is a concatenated vector-valued Gaussian process with spectral density , as shown
and matrices , and where .
4.3 Joint posterior inference of inputs and states
The discrete-time form of the augmented SSM (Equation 48) can be obtained as
where the state-space matrices and . is a zero-mean Gaussian white noise vector representing the discrete-time form of whose covariance is given by
where is the matrix exponential of the state-transition matrix. The integral in Equation 51 is solved using matrix fraction decomposition (see  for implementation details). Zero-mean Gaussian white noise with covariance can be added to the process model in Equation 50 to account for unmodelled dynamics. The resulting modified form of Equation 50 can be written as
where is the modified white noise vector with modified covariance as defined below:
and are assumed to be uncorrelated to each other and their joint covariance is expressed through Equation 11.
The posterior distribution of the states and the forces can be estimated with the classical Kalman filter (and smoother). Here, is a vector comprising all covariance function hyperparameters i.e. . The estimation is started from a Gaussian noise prior where the covariance matrix has the following block diagonal form
is the initial covariance matrix for the non-augmented state vector chosen according to some prior knowledge. The covariance matrix for the th latent force is set equal to the steady state covariance matrix obtained using Equation 37, i.e. , .
4.4 Hyperparameter optimization for GP covariance functions
The structural model parameter matrices and that of the covariance functions corresponding to latent forces are transformed into parameters of the augmented GPLFM state-space model. As already stated in Section 2.2, the knowledge of the structural system parameters is assumed to be known a priori (either from FE model or from system identification), and therefore the augmented state estimation results will depend only on the parameters of the chosen covariance functions for the latent force components. In general, a parametric family of covariance functions is chosen and the hyperparameters are optimized based on the measurement data. Typical hyperparameters include lengthscale and signal variance for a standard family of covariance functions. The hyperparameters can be estimated in different ways, including maximization of marginal likelihood , maximum a posteriori 
, and Markov Chain Monte Carlo methods. In this study, the optimized hyperparameters are obtained by maximizing the likelihood function based on the measurements. Maximum likelihood estimates of the hyperparameters (i.e. signal variance and lengthscale) of the covariance function(s) can be obtained by minimizing the negative log-likelihood (or maximizing the log-likelihood) of the measurements as follows:
Using Kalman filter recursions (see B), the probabilities can be obtained as and where is the estimate of the initial augmented state vector and is the th predicted augmented state vector. The minimization of the negative log-likelihood of the measurements can then be expressed in terms of the innovations and the innovation covariance (see  for more details) as:
The expressions for and are provided in Equations 77 and 78 respectively. The minimization can be done using optimization tools such as MATLAB’s built-in functions fminunc or fmincon. It is noteworthy to mention that maximum likelihood optimization may get stuck in a local minimum; to avoid this one may need to start the optimization from different initial points. An algorithm depicting the steps involved in the proposed algorithm is shown in Algorithm 1.