I Introduction
System identification is a wellestablished area of automatic control [3, 33, 52]
. A wide range of identification methods have been developed for parametric and nonparametric models as well as for greybox
[25] or blackbox models [46]. Contrary, the field of machine learning
[6, 35] and especially deep learning [17, 27] has emerged as the new standard in many disciplines to model highly complex systems. A large number of deep learning based tools have been developed for a broad spectrum of applications. Deep learning can identify and capture patterns as a blackbox model. It has been shown to be useful for high dimensional and nonlinear problems emerging in diverse areas such as image analysis [20, 29], time series modelling [26], speech recognition [11] and text classification [57]. This paper provides one step to combine the areas of system identification and deep learning [32] by showing the usefulness of deep SSMs applied to nonlinear system identification. It helps to bridge the gap between the two fields and to learn from each others advances.Nowadays, a wide range of system identification algorithms for parametric models are available [31, 50]
. Parametric models such as SSMs can include preexisting knowledge about the system and its structure and can obtain more precise identification results. SSMs can be similarly to hidden Markov models
[39]more expressive than e.g. autoregressive models due to their use of hidden states. For automatic control this is a popular model class and a variety of identification algorithms is available
[44, 53].In the deep learning community there have been recent advances in the development of deep SSMs. See e.g. [1, 4, 10, 12, 13, 16, 28, 40]. The class of deep SSMs has three main advantages. (1) It is highly flexible due to the use of Neural Networks (NNs) and it can capture a wide range of system dynamics. (2) Similar to SSMs it is more expressive than standard NNs because of the hidden variables. (3) In addition to the system dynamics, deep SSMs also capture the output uncertainty. These advantages have been exploited for the generation of handwritten text [30] and speech [38]. These examples have highly nonlinear dynamics and require to capture the uncertainty to generate new realistic sequences. The main contributions of this paper are:

Bring the communities of system identification and deep learning closer by giving an insight to a deep learning model class and its learning algorithm by applying it to system identification problems. This will extend the toolbox of possible blackbox identification approaches with a new class of deep learning models. This paper complements the work in [2], where deterministic NNs are applied to nonlinear system identification.

The system identification community defines a clear separation between model structures and parameter estimation methods. In this paper the same distinction between model structures (Section
II) and the learning algorithm to estimate the model parameter (Section III) is taken as a future guideline for deep learning. 
Six deep SSMs are implemented and compared for nonlinear system identification (Section IV). The advantages of the models are highlighted in the experiments by showing that a maximum likelihood estimate is obtained and additionally the uncertainty is captured. Hence, a richer representation of the system dynamics is identified which is beneficial for example in robust control or system analysis.
Ii Deep State Space Models for Sequential Data
Deep learning is a highly active field with research in many directions. One active topic is sequence modeling as motivated by the temporal nature of the physical environment. A dynamic model is required to replicate the dynamics of the system. The model is a mapping from observed past inputs and outputs to predicted outputs . An SSM is obtained if the computations are performed via a latent variable that incorporates past information:
(1a)  
(1b) 
where denote the set of unknown parameters. If the functions and are described by deep mappings such as deep NNs, the resulting model is referred to as a deep SSM.
A second deep learning research direction is that of generative models involving model structures such as generative adversarial networks (GANs) [18]
and Variational Autoencoders (VAEs)
[24, 41], which are used to learn representations of the data and to generate new instances from the same distribution. For example, realistic images can be generated from these models [54]. Extending VAEs to sequential models [14] produce the subclass of deep SSM model which are studied in this paper. The building blocks for this type of model are Recurrent NNs (RNNs) and VAEs.Iia Recurrent Neural Networks
RNNs are NNs, suited to model sequences of variable length [17]. Models with external inputs and outputs at each time step are considered. RNNs make use of a hidden state , similar to (1a) but without considering the outputs for the state update. A block diagram is given in Fig. 1
showing the similarity to classic SSMs. The figure highlights that the function parameters are learned by unfolding the RNN and using backpropagation through time
[17]. Often a regularized L2loss between the predicted output and true outputis considered. The most successful types of RNNs for long term dependencies are Long ShortTerm Memory (LSTM) networks
[21]and Gated Recurrent Units (GRUs)
[8], which yield empirically similar results [9]. GRUs are used within this paper due to of their structural simplicity.IiB Variational Autoencoders
A VAE [24, 41] embeds a representation of the data distribution of in a low dimensional latent variable via an inference network (encoder). A decoder network uses to generate new data of approximately the same distribution as . The conceptual idea of a VAE is visualized in Fig. 2 and can be viewed as a latent variable model. The dimension of
is a hyperparameter.
In the VAE it is in general assumed that the data
has a normal distribution. Therefore, the decoder is chosen accordingly as
. The parameters for this distribution are given by as deep NN with parameters , input and outputs and. Hence, the generative model is characterized by the joint distribution
, where the multivariate normal distribution is used as prior. The prior parameters are usually chosen to be .For the data embedding in , the distribution of interest is the posterior which is intractable in general. Instead of solving the posterior, it is approximated by a parametric distribution . The distribution parameters are encoded by a deep NN . This network is optimized by variational inference [7, 22] of the variational parameters which are shared over all data point, using an amortized version [56].
There exists a connection between the VAE and linear dimension reduction methods such as PCA. In [43] it is shown that the PCA corresponds to a linear Gaussian model. Specifically, the VAE can be viewed as a nonlinear generalization of the probabilistic PCA.
IiC Combining RNNs and VAEs into deep SSMs
RNNs can be viewed as a special case of classic SSMs with Dirac delta functions as state transition distribution [14], see (1a) for comparison. The VAE can be used to approximate the output distributions of the dynamics, see (1b). A temporal extension of the VAE is needed for the studied class of deep SSMs. The parameters of the VAE prior are updated sequentially with the output of a RNN. The state transition distribution is given by with . Compared with the VAE prior the parameters are now not static but dependent on previous time steps and therefore describes the recurrent nature of the model. Similarly the output distribution is given as with . The joint distribution of the deep SSM is
(2) 
Similar to the VAE, this expression describes the generative process. It can be further decomposed with a clear separation between the RNN and the VAE. The most simple form within the studied class of deep SSMs is obtained, the socalled VAERNN [14]. The model consists of stacking a VAE on top of an RNN as shown in Fig. 3. Notice the clear separation between model parameter learning in the inference network with the available data and the output prediction in the generative network. The joint true posterior can be factorized according to the graphical model as
(3) 
with prior given by with only depending on the recurrent state . The approximate posterior can be chosen to mimic the same factorization
(4) 
There are multiple variations in this class of deep SSM, next to the VAERNN [14]. The ones considered in this paper are:

Variational RNN (VRNN) [10]: Based on VAERNN but the recurrence additionally uses the previous latent variable for .

VRNNI [10]: Same as VRNN but a static prior is used in every time step.

Stochastic RNN (STORN) [4]: Based on the VRNNI. In the inference network STORN uses additionally a forward running RNN with input , latent variable and output . Hence is characterized by .
Graphical models for these extensions are provided in Appendix A. For VRNN and VRNNI an additional version using Gaussian mixtures as output distribution (VRNNGMM) is studied. More methods are available in literature, see e.g. [1, 13, 12, 16].
Iii Model Parameter Learning
Iiia Cost Function for the VAE
The parameter learning method of the deep SSMs is based on the same method as for VAEs. The VAE parameters are learned by maximum likelihood estimation with data points . By performing variational inference with shared parameters for all data one obtains
(5)  
(6)  
(7) 
where Jensen’s inequality is used in (7). The right hand side is referred to as the evidence lower bound (ELBO) and can be rewritten using the KullbackLeibler (KL) divergence
(8) 
where the expectation is with respect to . The first term encourages the reconstruction of the data by the decoder. The KLdivergence in the second term is a measure of closeness between the two distributions and it can be interpreted as regularization term. Approximate posterior distributions far away from the prior are penalized. The total ELBO is then given by which is maximized instead of the intractable loglikelihood .
IiiB Cost Function for Deep SSMs
A temporal extension of the VAE parameter learning is required for the studied deep SSMs. Again, amortized variational inference with ELBO maximization is used. A similar derivation for the total ELBO of the VAE as in (7) leads for the generic deep SSM to
(9) 
where the expectation is w.r.t. the approximate distribution . The factorization of the true joint posterior distribution from (2) can be applied which yields a total ELBO as the sum over all time steps. Note that in this generic scheme could be factorized as , which requires a smoothing step since depends on all inputs and output from . If there would be a similar factorization for the approximate posterior as in (2), then one can obtain a similar expression as for the VAE in (8).
In the VAERNN a solution for parameter learning is obtained due to the clear separation between the RNN and the VAE. Note that here no smoothing step for the variational distribution is necessary since the states are independent given as can be seen by dseparation in Fig. 3. The same factorization as in (4) can be used. The total ELBO for the VAERNN can then be written as
(10) 
where the expectation is taken w.r.t. the approximate posterior . Applying the posterior factorizations in (3) and (4) to the total ELBO in (10) and taking the expectation w.r.t yields
(11) 
which is of the same form as for the VAE ELBO in (8) but with a temporal extension as summation over all time steps.
Iv Numerical Experiments
All six models described in Section II are evaluated. The model hyperparameters are the size of the hidden state denoted by , the size of the GRU hidden state denoted by and the number of layers within the GRU networks . For STORN the dimension of is chosen equal to the one of . The VRNNGMM uses 5 Gaussian mixtures in the output distribution.
For parameter learning as well as for hyperparameter and model selection the data is split into training data and validation data. A separate test data set is used for evaluating the final performance. The ADAM optimizer [23]
with default parameters is used with early stopping and batch normalization
[17]. The initial learning rate ofis decayed if the validation loss does not decrease for a specific amount of epochs. The sequence length for training in minibatches is considered as a design parameter.
Three experiments are conducted: (1) a linear Gaussian system, (2) the nonlinear NarendraLi Benchmark [36], and (3) the WienerHammerstein process noise benchmark [45]. The first two experiments are considered to show the power of deep SSMs for uncertainty quantification with known true uncertainty, while the last experiment serves as a more complex real world example. The identified models are evaluated in open loop. The initial state is not estimated. The generated output sequences are compared with the true test data output. As performance metric, the root mean squared error (RMSE) is considered, . Here is considered such that a fair comparison with maximum likelihood estimation methods can be made. To quantify the quality of the uncertainty estimate, the negative loglikelihood (NLL) per time step is used,
, describing how likely it is that the true data point falls in the model output distribution. PyTorch code is available on
https://github.com/dgedon/DeepSSM_SysID.Iva Toy Problem: Linear Gaussian System
Consider the following linear system with process noise and measurement noise
(12a)  
(12b) 
The models are trained and validated with 2 000 samples and tested on the same 5 000 samples. The same number of layers in the NNs is taken for all models but with different number of neurons per layer. A grid search for the selection of the best architecture is performed with
and . Here is chosen due to the simplicity of the experiment. For all models the architecture with the lowest RMSE value is presented.The deep SSMs are compared with two methods. First, a linear model is identified by SSEST [34] as a graybox model with
states. SSEST also estimates the output variance, which is used as comparison. Second, the true system matrices as best possible linear model are run in open loop without noise.
The results are presented in Table I
where the models are sorted from simple to more complex. For the deep SSMs the values are averaged over 50 identified models and for the comparison methods over 500 identifications, since these methods are computationally less expensive. The results indicate that the deep SSMs can reach an accuracy close to the one of state of the art methods. Note that SSEST assumes a linear model, whereas the deep SSMs fit a flexible, nonlinear model. The table also shows that the more complex the models is, the more accurate the result is. Note that no fine tuning was necessary to obtain these results. A plot with mean and confidence interval of
standard deviation for the test data and for STORN is given in Fig. 4. The figure shows that the dynamics are captured by STORN as well as by SSEST. Furthermore, the uncertainty is captured well, but is is conservatively overestimated. Compared with the NLL value from SSEST, the uncertainty estimation is also slightly more conservative than the one of SSEST.Model  RMSE  NLL  (,) 

VAERNN  1.562  1.951  (80,10) 
VRNNGaussI  1.477  1.817  (50,5) 
VRNNGauss  1.471  1.848  (80,2) 
VRNNGMMI  1.448  1.798  (70,10) 
VRNNGMM  1.432  1.792  (50,5) 
STORN  1.427  1.785  (60,5) 
SSEST [34]  1.412  1.775   
true lin. model (noise free)  1.398     
IvB NarendraLi Benchmark
The dynamics of the NarendraLi benchmark are given by [36] with additional measurement noise according to [47]. The benchmark is designed to be highly nonlinear, but it does not represent a real physical system. For more details, see the appendix.
This benchmark is evaluated for varying number of training samples ranging from 2 000 to 60 000. For each identification 5 000 validation samples and the same 5 000 test samples are used. To choose the architecture a gridsearch is performed. This revealed, that both for small and large training sample sizes, it is better to have larger networks. Hence, for comparability, all models are run with , and . No batch normalization is applied.
The results are given in Fig. 5 and show averaged RMSE and NLL values over 30 estimated models for varying model and training data size. Generally, more training data yields more accurate estimations, both in terms of RMSE and NLL. After a specific amount of training data, the identification results stop to improve. This plateau indicates that the chosen model is saturated. Larger models could be more flexible to decrease the values even further. Specifically, the STORN model outperforms the other models, all of which show similar performance. This is due to the enhanced flexibility in STORN with the second recurrent network in the inference. Hence, more accurate state representations can be learned.
The lowest RMSE values of each model are compared in Table II with results from literature. Note that the comparison methods do not estimate the uncertainty, hence no NLL can be given. Table II also include the required number of samples to obtain the given performance. The table indicated that the deep SSM models require in general more samples for learning than classic models. In particular STORN reaches RMSE values close to the state of the art. Note that graybox models from the literature are compared with deep SSMs as blackbox model which can explain the performance gap.
A time evaluation of an openloop run between the true dynamics and the ones identified with STORN is given in Fig. 6. Mean value and standard deviations are shown. The figure highlights two points. First, the complex and nonlinear dynamics are identified well by the deep SSM. Second, the uncertainty bounds are captured but are much more conservative than the true bounds. This is in line with empirical results in [19, 37], which show that variational inference based Bayesian methods perform less accurate than for example ensembling based methods.
Model  RMSE  NLL  Samples 
VAERNN  0.841  1.341  50 000 
VRNNGaussI  0.890  1.309  60 000 
VRNNGauss  0.851  1.284  30 000 
VRNNGMMI  0.869  1.289  20 000 
VRNNGMM  0.869  1.300  50 000 
STORN  0.639  1.197  60 000 
[55] Multivariate adaptive  0.46    2 000 
regression splines  
[55] Adaptive hinging hyperplanes 
0.31    2 000 
[47] Modelondemand  0.46    50 000 
[42] Direct weight optimization  0.43    50 000 
[49] Basis function expansion  0.06    2 000 
IvC WienerHammerstein Process Noise Benchmark
The WienerHammerstein benchmark with process noise [45] provides measured inputoutput data from an electric circuit. The system can be described by a nonlinear WienerHammerstein model which sandwiches a nonlinearity between two linear dynamic systems. Additional process noise enters before the nonlinearity, which makes the benchmark particularly difficult. The aim is to identify the behavior of the circuit for new input data.
The training data consist of 8 192 samples where the input is a faded multisine realization. The validation data are taken from the same data set but for a different realization. The test data set consists of 16 384 samples, one multisine realization and one swept sine. Preliminary tests indicate that a longer training sequence length yield more accurate results, hence a length of 2 048 points is used. This benchmark is evaluated for varying sizes of the deep SSM layers. Here with constant and .
The resulting RMSE values for the multisine and swept sine test sequence are presented in Fig. 7. The lowest RMSE values are in Table III compared to state of the art methods from the literature. The values are presented as averages over 20 identified models. The plot indicates that the influence of is rather limited. Larger values and therefore larger NNs in general tend to result in more accurate identification results. Again, STORN yields the best results, while also the very simple VAERNN identifies this complex benchmark well. The jagged behaviour of the plot may arise since the chosen identification data set only consists of two realizations. Therefore the randomness over the multiple identification originates mainly from random initialization of the weights in the NNs.
Model  RMSE [swept sine]  RMSE [multisine] 

VAERNN  0.0495  0.0587 
VRNNGaussI  0.0763  0.0755 
VRNNGauss  0.0817  0.0785 
VRNNGMMI  0.0660  0.0669 
VRNNGMM  0.0760  0.0736 
STORN  0.0338  0.0509 
[5] NOBF  0.2  0.3 
[5] NFIR  0.05  0.05 
[5] NARX  0.05  0.05 
[51] PNLSS  0.022  0.038 
[15] Best Linear Approx.    0.035 
[15] ML    0.0162 
[48] SMC  0.014  0.015 
V Conclusion and Future Work
This paper provides an introduction to deep SSMs as an extension to classic SSMs using highly flexible NNs. The studied model class and parameter learning method based on variational inference and ELBO maximization are elaborated. Six model instances are then applied to three system identification problems in order to benchmark the potential of these models. The results indicate that the class of deep SSMs can be a competitive approach to classic identification methods. Note that deep SSMs are blackbox models, which only require a few hyperparameters to be tuned. The models in this benchmark study are not fine tuned to obtain the presented results. Therefore, the toolbox of possible nonlinear system identification methods is extended by a new blackbox model class based on deep learning. The studied models have the additional advantage of estimating the uncertainty in the system dynamics by its probabilistic nature. The uncertainty bound appears to be as conservative as established uncertainty quantification methods. This conservative behavior is in line with the existing literature on variational inference of deep learning models.
This study only concerns a subclass of deep SSMs, namely models based on variational inference learning methods. Future work should study a broader class of deep SSMs and more nonlinear system identification benchmarks should be considered. An interesting continuation is to study for the linear toy problem the performance of a onestepahead predictor model with the Kalman filter as ground truth. Similarly, for nonlinear systems a comparison with the particle filter can be considered in the onestepahead predictor model as ground truth. Finally, it is of interest to use deep SSM in automatic control like e.g. model predictive control and to elaborate how to exploit the latent state variables.
References
 [1] (2017) ZForcing: Training Stochastic Recurrent Networks. In Advances in Neural Information Processing Systems 30, pp. 6713–6723. Cited by: §I, §IIC.
 [2] (201912) Deep Convolutional Networks in System Identification. In Proceedings of the 58th IEEE Conference on Decision and Control, Nice, France, pp. . Cited by: 1st item.
 [3] (197103) System identification—A survey. Automatica 7 (2), pp. 123–162 (en). External Links: ISSN 00051098, Document Cited by: §I.
 [4] (201503) Learning Stochastic Recurrent Networks. arXiv:1411.7610. Note: Comment: Submitted to conference track of ICLR 2015 External Links: 1411.7610 Cited by: §I, 3rd item.
 [5] (201707) Automatic Modeling with Local Model Networks for Benchmark Processes. 20th IFAC World Congress 50 (1), pp. 470–475 (en). External Links: ISSN 24058963, Document Cited by: TABLE III.
 [6] (2006) Pattern recognition and machine learning. SpringerVerlag, New York (en). External Links: ISBN 9780387310732, LCCN Q327 .B52 2006 Cited by: §I.
 [7] (2017) Variational Inference: A Review for Statisticians. Journal of the American Statistical Association 112 (518), pp. 859–877. External Links: ISSN 01621459, 1537274X, Document, 1601.00670 Cited by: §IIB.

[8]
(201410)
On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
. In Proceedings of SSST8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, pp. 103–111. External Links: Document Cited by: §IIA. 
[9]
(201412)
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
. arXiv:1412.3555. Note: Comment: Presented in NIPS 2014 Deep Learning and Representation Learning Workshop External Links: 1412.3555 Cited by: §IIA.  [10] (2015) A Recurrent Latent Variable Model for Sequential Data. In Advances in Neural Information Processing Systems 28, pp. 2980–2988. Cited by: §I, 1st item, 2nd item.
 [11] (201305) New types of deep neural network learning for speech recognition and related applications: an overview. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603. External Links: ISSN 2379190X, Document Cited by: §I.
 [12] (201506) Variational Recurrent AutoEncoders. arXiv:1412.6581. External Links: 1412.6581 Cited by: §I, §IIC.
 [13] (201612) Sequential neural models with stochastic layers. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, pp. 2207–2215. External Links: ISBN 9781510838819 Cited by: §I, §IIC.
 [14] (2018) Deep Latent Variable Models for Sequential Data. Ph.D. Thesis, DTU Compute. Note: PhD Thesis Cited by: §IIC, §II.
 [15] (2018) Maximum Likelihood identification of WienerHammerstein system with process noise. 18th IFAC Symposium on System Identification SYSID 2018 51 (15), pp. 401–406 (en). External Links: ISSN 24058963, Document Cited by: TABLE III.
 [16] (201412) Learning Temporal Dependencies in Data Using a DBNBLSTM. arXiv:1412.6093. External Links: 1412.6093 Cited by: §I, §IIC.
 [17] (2016) Deep Learning. MIT Press. Cited by: §I, §IIA, §IV.
 [18] (2014) Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, pp. 2672–2680. Cited by: §II.

[19]
(202001)
Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision
. arXiv:1906.01620. Note: Comment: Code is available at https://github.com/fregu856/evaluating_bdl External Links: 1906.01620 Cited by: §IVB. 
[20]
(2015)
Delving Deep into Rectifiers: Surpassing HumanLevel Performance on ImageNet Classification
. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034. Cited by: §I.  [21] (199711) Long ShortTerm Memory. Neural Comput. 9 (8), pp. 1735–1780. External Links: ISSN 08997667, Document Cited by: §IIA.
 [22] (199911) An Introduction to Variational Methods for Graphical Models. Machine Learning 37 (2), pp. 183–233. External Links: ISSN 08856125, Document Cited by: §IIB.
 [23] (2015) Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, (ICLR), San Diego, CA, USA. Note: Comment: Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015 External Links: 1412.6980 Cited by: §IV.
 [24] (2014) AutoEncoding Variational Bayes. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, Canada. External Links: 1312.6114 Cited by: §IIB, §II.
 [25] (200402) Parameter estimation in stochastic greybox models. Automatica 40 (2), pp. 225–237 (en). External Links: ISSN 00051098, Document Cited by: §I.
 [26] (201406) A review of unsupervised feature learning and deep learning for timeseries modeling. Pattern Recognition Letters 42, pp. 11–24 (en). External Links: ISSN 01678655, Document Cited by: §I.
 [27] (201505) Deep learning. Nature 521 (7553), pp. 436–444 (en). External Links: ISSN 14764687, Document Cited by: §I.

[28]
(201908)
Learning Interpretable Deep State Space Model for Probabilistic Time Series Forecasting.
In
Proceedings of the TwentyEighth International Joint Conference on Artificial Intelligence
, Macao, China, pp. 2901–2908 (en). External Links: Document, ISBN 9780999241141 Cited by: §I.  [29] (201712) A survey on deep learning in medical image analysis. Medical Image Analysis 42, pp. 60–88 (en). External Links: ISSN 13618415, Document Cited by: §I.
 [30] (200508) IAMOnDB  an online English sentence database acquired from handwritten text on a whiteboard. In Eighth International Conference on Document Analysis and Recognition (ICDAR’05), pp. 956–961 Vol. 2. External Links: ISSN 23792140, Document Cited by: §I.
 [31] (197810) Convergence analysis of parametric identification methods. IEEE Transactions on Automatic Control 23 (5), pp. 770–783. External Links: ISSN 23343303, Document Cited by: §I.
 [32] (2020) Deep Learning and System Identification. 21st IFAC World Congress, pp. 8 (en). Cited by: §I.
 [33] (1987) System identification: theory for the user. Prentice Hall, Englewood Cliffs, NJ (English). External Links: ISBN 9780138816407 Cited by: §I.
 [34] (2018) System identification toolbox: The Manual. 9th edition 2018 edition, The MathWorks Inc., Natick, MA, USA. Cited by: §IVA, TABLE I.
 [35] (2012) Machine learning: a probabilistic perspective. MIT Press, Cambridge, MA (en). External Links: ISBN 9780262018029, LCCN Q325.5 .M87 2012 Cited by: §I.
 [36] (1996) Neural networks in control systems. P. Smolensky, M. C. Mozer, and D. E. Rumelhard (Eds.), pp. 347–394. Cited by: §C, §IVB, §IV.
 [37] (2019) Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems 32, pp. 13991–14002. Cited by: §IVB.
 [38] (2013) The Blizzard Challenge 2013  Indian Language Tasks. Cited by: §I.
 [39] (198601) An introduction to hidden Markov models. IEEE ASSP Magazine 3 (1), pp. 4–16. External Links: ISSN 15581284, Document Cited by: §I.
 [40] (2018) Deep State Space Models for Time Series Forecasting. In Advances in Neural Information Processing Systems 31, pp. 7785–7794. Cited by: §I.
 [41] (201401) Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In International Conference on Machine Learning, pp. 1278–1286 (en). External Links: ISSN 19387228 Cited by: §IIB, §II.
 [42] (200503) Nonlinear system identification via direct weight optimization. Automatica 41 (3), pp. 475–490 (en). External Links: ISSN 00051098, Document Cited by: TABLE II.
 [43] (1998) EM Algorithms for PCA and SPCA. In Advances in Neural Information Processing Systems 10, pp. 626–632. Cited by: §IIB.
 [44] (201101) System identification of nonlinear statespace models. Automatica 47 (1), pp. 39–49 (en). External Links: ISSN 00051098, Document Cited by: §I.
 [45] (201707) WienerHammerstein benchmark with process noise. 20th IFAC World Congress 50 (1), pp. 448–453 (en). Cited by: §IVC, §IV.
 [46] (199512) Nonlinear blackbox modeling in system identification: a unified overview. Automatica 31 (12), pp. 1691–1724 (en). External Links: ISSN 00051098, Document Cited by: §I.
 [47] (1999) Model on demand: algorithms, analysis and applications. Linköping Studies in Science and Technology Dissertation, Univ, Linköping (en). External Links: ISBN 9789172194502 Cited by: §C, §IVB, TABLE II.
 [48] (201805) Learning of statespace models with highly informative observations: a tempered Sequential Monte Carlo solution. Mechanical Systems and Signal Processing 104, pp. 915–928. External Links: ISSN 08883270, Document, 1702.01618 Cited by: TABLE III.
 [49] (201706) A flexible state–space model for learning nonlinear dynamical systems. Automatica 80, pp. 189–199 (en). External Links: ISSN 00051098, Document Cited by: TABLE II.
 [50] (200501) Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM (en). External Links: ISBN 9780898717921 Cited by: §I.
 [51] (2016) A polynomial nonlinear statespace Matlab toolbox. In Workshop on Nonlinear System Identification Benchmarks, Brussels, Belgium, pp. 28 (en). Cited by: TABLE III.
 [52] (2007) Filtering and System Identification: A Least Squares Approach. Cambridge university press (en). Cited by: §I.
 [53] (199401) Identification of the deterministic part of MIMO state space models given in innovations form from inputoutput data. Automatica 30 (1), pp. 61–74 (en). External Links: ISSN 00051098, Document Cited by: §I.
 [54] (201806) HighResolution Image Synthesis and Semantic Manipulation with Conditional GANs. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 8798–8807 (en). External Links: Document, ISBN 9781538664209 Cited by: §II.
 [55] (200801) Adaptive Hinging Hyperplanes. IFAC Proceedings Volumes 41 (2), pp. 4036–4041 (en). External Links: ISSN 14746670, Document Cited by: TABLE II.
 [56] (201908) Advances in Variational Inference. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (8), pp. 2008–2026. External Links: ISSN 19393539, Document Cited by: §IIB.
 [57] (2015) Characterlevel Convolutional Networks for Text Classification. In Advances in Neural Information Processing Systems 28, pp. 649–657. Cited by: §I.
Comments
There are no comments yet.