I Introduction
Inertial sensors have been used for several decades in aerospace system for attitude control and navigation. Drastic advances in microelectromechanic systems (MEMS) have lead to the development of miniaturized strapdown inertial measurement units (IMUs), which entered a multitude of new application domains from autonomous drones to ambulatory human motion tracking.
In strapdown IMUs, the angular rate and acceleration – and sometimes also the magnetic field vector – are measured in a sensorintrinsic threedimensional coordinate system, which moves along with the sensor. Estimating the orientation, velocity or position of the sensor with respect to some inertial frame requires strapdown integration of the angular rates and sensor fusion of the aforementioned raw measurement signals (cf. Figure
1).To estimate the orientation of an IMU from its raw measurement signals in real time is a fundamental standard problem of inertial sensor fusion. A large variety of filter algorithms have been proposed previously, some of which are implemented in motion processing units of modern miniature IMUs. It is well known that the attitude of the sensor can be determined by 6D sensor fusion, i.e. fusing 3D gyroscope and 3D accelerometer readings, while estimating the full orientation (attitude and heading) requires 9axis sensor fusion, i.e. using 3D magnetometer readings in addition to the 6D signals.
Existing solutions to inertial attitude estimation are typically modelbased and heuristically parameterized. They use mathematical models of measurement errors and threedimensional rotations and transformations of the gravitational acceleration. They require a reasonable choice of covariance matrices, fusion weights or parameters that define how weights are adjusted. While considerably high accuracies have been achieved with such solution approaches in many application domains, it is also wellknown that different parameterizations perform differently well for different types of motions and disturbances. In fact, to the best of our knowledge, there is to date no filter algorithm that yields consistently small errors across all types of motion that a MEMSbased IMU might perform.
Abundant research has demonstrated the capabilities of artificial neural networks in providing databased solutions to problems that have conventionally been addressed by modelbased approaches. If sufficiently large amounts of data and computation capability are available, generally usable solutions may be found for most problems. While ample work has shown that a number of problems can also be solved using neural networks, the practically more relevant question whether neural networks can outperform conventional solutions often remains unanswered.
In the present work, we investigate whether a neural network can solve the realtime attitude estimation task with similar or even better performance than a stateoftheart inertial orientation estimation filter. Moreover, we analyze at which cost this can be achieved, in terms of required number of data sets, required complexity and applicationspecific structure of the neural network.
Ii Related Work
We first briefly review the stateoftheart in realtime attitude estimation from inertial sensor signals and then describe previous work on the use of artificial neural networks for inertial motion analysis.
Iia Inertial Attitude Estimation
As mentioned above, the attitude of an IMU can be determined by sensor fusion of the accelerometer and gyroscope readings. Accelerometers yield accurate attitude information in static conditions, i.e. when the sensor moves with constant velocity. Under dynamic conditions, however, their readings are only useful under certain assumption, for example that the average change of velocity is zero on sufficiently large time scales. Gyroscopes yield highly accurate information on the change of attitude. However, pure strapdown integration of the angular rates is prone to drift resulting from measurement bias, noise, clipping and undersampling. Accurate attitude estimation under nonstatic conditions requires sensor fusion of both 3D signals.
A number of different solutions have been proposed for this task. Categorizations and comparisons of different algorithms can be found, for example, in [2, 3]
. Most filters use either an extended Kalman filter scheme or a complementary filter scheme, and unit quaternions are a common choice for mathematical representation of the threedimensional orientation. The balance between gyroscopebased strapdown integration and accelerometerbased drift correction is typically adjusted to the specific application by manual tuning of covariance matrices or other fusion weights. Methods have been proposed that analyze the accelerometer norm to distinguish static and dynamic motion phases and adjust the fusion weights in real time.
A rather recently developed quaternionbased orientation estimation filter is described in [4]. It uses geodetic accelerometerbased correction steps and optional magnetometerbased correction steps for heading estimation. The correction steps are parametrized by intuitively interpretable time constants, which are adjusted automatically if the accelerometer norm is far from the static value or has been close to that value for several consecutive time steps. The performance of this filter and five other stateoftheart filters has recently been evaluated across a wide range of motions. For all filters, errors between two and five degrees were found for different speeds of motion [5]. To the best of our knowledge, a significantly more accurate solution for attitude estimation in MEMSbased IMUs does not exist.
IiB Neural Networks for Attitude Estimation
In inertial motion tracking, neural networks have mostly been applied to augment existing conventional filter solutions. In [6] a Recurrent Neural Network (RNN) is used for movement detection in order to decide which Kalman filter should be applied to the current system state. In [7]
a feed forward neural network is used as for smoothing the output of a Kalman filter, while a RNN is used for data preprocessing of Kalman filter inputs in
[8]. A similar approach is used in [9], where a convolutional neural network is used for error correction of the gyroscope signal as part of a strapdown integration.In [10] and [11] RNNs are used as blackboxes for the orientation integration over time. While the former uses a combination of gyroscope and visual data, the latter only relies on the gyroscope achieving similar results. In a few more recent works, neural networks have been applied directly as blackboxes for angle estimation problems. In [12] a RNN is used for human limb assignment and orientation estimation of IMUs that are attached to human limbs. It achieved a high accuracy at the assignment problem but was only partially successful at the orientation estimation problem. In [13] a bidirectional RNN is used for velocity and heading estimation on a twodimensional plane in polar coordinates.
To conclude, an endtoend neural network model for IMUbased attitude estimation has not been developed yet. All of the presented neural networks are either an addition to classical filters for attitude estimation or they address different problems.
Iii Problem Statement
Consider an inertial sensor with an intrinsic righthanded coordinate system . Neglect the rotation of the Earth and define an inertial frame of reference with vertical zaxis. The orientation of the sensor with respect to the reference frame is then described by the rotation between both coordinate systems, which can be expressed as a unit quaternion, a rotation matrix, a set of three Euler angles or a single angle and a corresponding rotation axis. Both frames are said to have the same attitude if the axis of that rotation is vertical.
If the true orientation of the sensor is given by the unit quaternion and an attitude estimation algorithm yields an estimate , then is the estimation error quaternion expressed in reference frame axes. The attitude estimation is said to be perfect if the estimated orientation is correct up to a rotation around the vertical axis. This is the case if the rotation axis of is vertical. If that axis is not vertical, then can be decomposed into a rotation around the vertical axis and a rotation around a horizontal axis. For any given with real part and third imaginary part , the smallest possible rotation angle of is . This corresponds to the smallest rotation by which one would need to correct the estimate to make its attitude error zero in the aforementioned sense.
These definitions allow us to formulate the following attitude estimation problem: Given a sampled sequence of threedimensional accelerometer and gyroscope readings of a MEMSbased IMU moving freely in three dimensional space, estimate the attitude of that IMU with respect to the reference frame at each sampling instant only based on current and previous samples. Denote the sensor readings by and , respectively, with being the discrete time and the number of samples. The desired algorithm should then yield a sampled sequence of estimates with a possibly small cumulative attitude estimation error defined by
(1)  
(2)  
(3) 
where is the true orientation of the sensor at time . In the following sections, we aim to develop an artificial neural network that solves the given problem and compare it to an established attitude estimation filter.
Iv Neural Network Model
In this work a neural network model with stateoftheart best practices for time series will be implemented. Building upon that, further optimizations are introduced that utilize domainspecific knowledge.
Iva Neural Network Structure with general best practices
The performance of a neural network model depends on the model architecture and the training process. First we identify potential model architectures for attitude estimation. After that we develop an optimized training process for these architectures.
The model architecture consists of multiple layers that may be connected in multiple ways leading to different characteristics. First a method for modelling the dynamic system states has to be chosen. A common practice is to connect the model output to the model input creating an autoregressive model that stores the system state information in the single autoregressive connection. For longer sequences, the autoregressive model’s inherent sequential nature prevents parallelization and therefore an efficient use of hardware acceleration, which slows down the training. Using neural network layers that are able to model system states avoids the need of autoregression for dynamic systems. The most commonly used ones are Recurrent Neural Networks (RNNs) and Temporal Convolutional Networks (TCNs).
RNNs have recurrent connections between samples in their hidden activations for modelling the state of a dynamic system. There are different variants of RNNs with Long ShortTerm Memories (LSTMs) being the most prevalent
[14]. LSTMs add three gates to the classical RNN, which regulate the information flow of its hidden activations. This stabilizes the stored state, enabling the application to systems with longterm dependencies, like integrating movements over a long amount of time. Because LSTMs are prone to overfitting, several regularization methods for sequential neural networks have been developed [15]. Increasing the amount of regularization together with the model size is the main approach for improving a neural network without domainspecific knowledge. In the present work, we use a twolayer LSTM Model with a hidden size of 200 for each layer and a final linear layer that reduces the hidden activation count to four. These four activations represent the elements of the estimated attitude quaternion. In order to always generate a unit quaternion, the elements are divided by their Euclidean norm. The structure of the RNN model used in this work is visualized in Figure 2.An alternative approach to RNNs for sequential data are TCNs. TCNs are causal onedimensional dilated convolutional neural networks with receptive fields big enough to model the system dynamics[16]. The main advantage of TCNs compared to RNNs is their pure feedforward nature. Having no sequential dependencies leads to parallelizability and therefore fast training on hardware accelerators [17]. The TCN’s receptive field describes the amount of samples taken into account for predicting a sample. Because TCNs are stateless, the receptive field needs to be large enough to implicitly estimate the system state from the input signals. Because of the dilated convolutional layers, the receptive field grows exponentially with the depth of the neural network allowing for large windows using a manageable amount of layers. In the present work, we use a 10layer TCN with a receptive field of samples and a hidden size of 200 for each layer. The structure of that TCN model is visualized in Figure 3.
For linear and convolutional layers, batchnorm [18]
is used. Batchnorm standardizes the layer activations, enabling larger learning rates and better generalization. Instead of the commonly used sigmoid or rectified linear unit activation functions, we use Mish, which achieved stateoftheart results in multiple domains
[19]. Mish combines the advantages of both activation functions. On the one hand, it is unbounded in positive direction and thus avoids saturation like rectified linear units. On the other hand, it is smooth like sigmoid functions, which improves gradientbased optimization.
For training, long overlapping sequences get extracted from the measured sequences, so the Neural Networks initializes with different states. Because RNNs can only be reasonably trained with a limited amount of time steps for every minibatch, truncated backpropagation through time is used
[20]. That means that the long sequence gets split in shorter windows that are used for training, transferring the hidden state of the RNN between every minibatch. The measured sequences are standardized with the same mean and standard deviation values to improve training stability
[18].The main component of the training process is the optimizer. We use a combination of RAdam and Lookahead, which has proven to be effective at several tasks [21], [22]
. For the training process we used the Fastai 2 API, that is built upon Pytorch
[23]. One of the most important hyperparameters for training a neural network is the learning rate of the optimizer. We choose the maximum learning rate with the learning rate finder heuristic
[24] and use cosine annealing for faster convergence [25]. The learning rate finder heuristic determines the maximum learning rate by exponentially increasing the learning rate in a dummy training and finding the point at which the loss has the steepest gradient. Cosine annealing starts with the maximum learning rate, keeps it constant for a given amount of epochs and then exponentially decreases it over time.
The other hyperparameters of the neural network model, such as activation dropout and weight dropout, form a vast optimization space. To find a well performing configuration, we use populationbased training [26]. It is an evolutionary hyperparameter optimization algorithm that is parallelizable and computationally efficient. It creates a population of neural networks with different hyperparameters and trains them for some epochs. Then the hyperparameters and weights of the best performing models are overriding the worst ones, and minor hyperparameter variations are introduced. Repeating this process quickly yields a well performing solution.
IvB Loss Function
The output of the model is a quaternion that describes the attitude of the sensor. The loss function describes the accumulated error between the estimated and the ground truth values. In most cases, the meansquarederror between the estimated and reference values are taken. In the present case, an elementwise meansquarederror of the quaternion is not a reasonable choice, since the orientation cannot be estimated unambiguously with only accelerometer and gyroscope signals –a magnetometer would be necessary. An obvious solution would be to choose the loss function equal to the attitude error function
with(4)  
(5)  
(6) 
However, experiments show that using this error definition leads to unstable training resulting from an explodinggradient problem. This is caused by the
function, whose derivative function explodes for arguments approaching 1, which is the target of the optimization problem:(7)  
(8) 
Truncating close to 1 leads to a solution that is numerically stable with rare exceptions. Replacing the function with a linear term avoids the exploding gradient completely while keeping the monotonicity and correlation with the attitude:
(9) 
Figure 4 visualizes the differences between both functions and their gradients.
IvC Data Augmentation
Data augmentation is a method for increasing the size of a given dataset by introducing domainknowledge. This is a regularization method that improves the generalizability of a model and has already been applied successfully in computer vision
[28] and audio modelling [29]. In case of the present attitude estimation task, we virtually rotate the IMU by transforming the measured accelerometer, gyroscope and reference attitude data by a randomly generated unit quaternion. Thereby, orientation invariance for sensor measurements will be introduced to the model.IvD Grouped Input Channels
The default way of processing a multivariate time series is to put all the input signals into the same layer. An alternative way is to create groups of signals that interact with each other and disconnect them from those they don’t need to interact with. The idea is to alleviate the neural network’s effort in finding interactions between signals. This method has been applied previously to other tasks but without analysis of its impact on the performance [30][10]. In the present application, the accelerometer and gyroscope are grouped separately, with the accelerometer providing attitude information at large time scales and the gyroscope providing accurate information on the change of orientation, as visualized in Figure 6.
V Experiments
The performance of the proposed neural network is compared to the performance of an established attitude estimation filter in experiments with a ground truth based on markerbased optical motion tracking. A MEMSbased IMU (aktost, Myon AG, Switzerland) is rigidly attached to a 3Dprinted Xshaped structure with three reflective markers whose position is tracked at millimeter accuracy by a multicamera system (OptiTrack, Natural Point Inc., USA). For each moment in time, the threedimensional marker positions are used to determine a groundtruth sensor orientation with subdegree accuracy.
To analyze the algorithm performance across different types of motions and different levels of static or dynamic activity, we consider a large number of data sets from different experiments with the following characteristics:

rotation: The IMU is rotating freely in threedimensional space while remaining close to the same point in space.

translation: The IMU is translating freely in threedimensional space while remaining in almost the same orientation.

arbitrary: The IMU is rotating and translating freely in threedimensional space.

slow versus medium versus fast: The speed of the motion is varied between three different levels.

paused versus nonstop: The motion is paused every thirty seconds and continued after a tenseconds break or it is performed nonstop for the entire duration of the fiveminutes recordings.
Different combinations of these characteristics lead to a diverse data set of 15 recordings each of which contains more than 50,000 samples of accelerometer and gyroscope readings and groundtruth orientation at a sampling rate of 286 Hz. Figure 5 shows the Euclidean norms of the three axis of acceleration (acc) and angular rate signal (gyr) over time for three experiments with different combinations of the described characteristics.
The experimental data is used to validate and compare the following two attitude estimation algorithms:

Baseline: a quaternionbased attitude estimation filter with accelerometerbased correction steps and automatic fusion weight adaptation [4]. The filter time constant and weight adaptation gain are numerically optimized to yield the best performance across all data sets.

Neural Network (NN): The proposed neural network is trained on a subset of the available (augmented) data sets and validated on the complementary set of data.
The characteristics of applying neural networks to the attitude estimation problem are analyzed in three experiments. The first one compares the performance of the optimised neural network with the filter. The second one is an ablation study that quantifies the effect of every optimization and compares the performance of the RNN and TCN model. The last experiment analyzes the effect of scaling the size of the neural network.
Va Performance Analysis
In order to compare the performance of the proposed neural network model with the filter, the 15 recordings will be used for a leaveoneout crossvalidation. That means that the model will be trained with 15 recordings and validated on the one that was left out. This leads to an increase in computation time because for every recording a new independent model has to be trained, but it provides a better view on generalizability of the model architecture. The neural network used is the RNN with all the proposed optimizations applied.
The boxplot in Figure 7 compares the error distribution of the 15 recordings between the neural network and the baseline filter. It visualizes that (1) the neural network has a better average performance and (2) that it performs more consistently in difficult cases, exhibiting clearly smaller maximum errors. The performance comparison for each individual recording is visualized in Figure 8. It shows that, in the slow cases, both methods perform similarly, while the baseline filter sometimes diverges in the fast and arbitrarymotion cases. The diverging behaviour may be observed in Figure 5 in the fast arbitrarymotion case. Between the movements, when the IMU is resting, the algorithms use the gravitational acceleration to quickly converge towards the true attitude. Overall, the neural network outperforms the baseline filter significantly, which is even more remarkable in light of the fact that the baseline filter has been optimized on the whole dataset, while the neural network has never seen any of the validation data.
VB Ablation Study
In the ablation study, the effect of every domainspecific optimization on the performance of the neural network is analyzed. Furthermore, the performance of the RNN and TCN architectures on the attitude estimation problem are compared. In this study, the 15 recordings are split in 12 training recordings and 3 validation recordings. In order to be representative, the validation recordings are the ones that yielded the maximum, minimum and median error in the performance analysis. To both the RNN and TCN architecture with current best practices for time series as basemodels, the three domainspecific optimizations are added iteratively. First the elementwise meansquarederror loss is replaced by the optimized attitude error with smoothl1loss. In the second step, the data augmentation, which simulates a rotated IMU, is added. In the last step, the input layers are grouped in acceleration and gyroscope signals.
The results of the study are visualized in Figure 9. Without the optimizations, the RNN and TCN model perform at a similar level. However, after adding the optimizations, the RNN has a much smaller error. This is plausible with the TCN being limited to its receptive field, while the RNN can track the IMU movement for an indefinite time with its hidden states. Even extending the TCN’s receptive field to samples, which is a time window of more than seconds, the results stay the same. When the IMU moves for a longer duration than the time window, the estimation diverges. For this application, especially with real time applications in mind, the RNN is the better approach.
The second result is that all the optimizations improve both the RNN and the TCN. Grouping the input leads consistently to minor improvements, while the loss optimization and data augmentation have a significant impact on the performance. When the data augmentation is added to the model, the other general regularization methods need to be reduced or deactivated in order to avoid overregularization. Training and validation loss drop with same pace, which shows that it is very effective at regularizing the model. The same effect probably could be achieved by increasing the size of the dataset by several orders of magnitude, which would require more costly recordings.
The final result is that both the loss optimization and the data augmentation are necessary to outperform the baseline filter. Without these domainspecific optimizations, even the highly optimized general purpose neural networks do not generalize well enough. If all aforementioned optimizations are applied, the neural network performs significantly better than the baseline filter.
VC Model Size Analysis
In order to analyze the effect of the model size to the attitude error, the RNN model of the first experiment is applied to the 12 training and 3 validation recordings of the second experiment. The amount of neurons of each layer of the RNN is scaled from 10 to 200, and the attitude error is compared.
The results of the study are visualized in Figure 10. As expected, the error decreases with increasing hidden size, with the gradient decreasing at bigger neuron counts. In this example, 20 neurons per layer are already enough to achieve the same mean attitude error as the baseline filter. Decreasing the hidden size of RNNs helps to reduce the memory footprint and overall computation time, which is important for embedded systems. But it only marginally reduces the training and prediction time on hardware accelerators with high parallelization capabilities, because of its sequential nature.
Vi Conclusion
This work has shown that neural networks are a potent tool for IMUbased realtime attitude estimation. If domainspecific optimizations are in place, then large recurrent neural networks can outperform stateoftheart nonlinear attitude estimation filters. These optimizations require knowledge about the process that the neural network identifies. However, it does not require the specific knowledge (equations, signal characteristics, parameters) that is needed for implementing a wellperforming filter. Another requirement for the neuralnetworksbased solution is a sufficiently rich set of data with ground truth attitude. However, data augmentation was proven to reduce this demand significantly.
Leaveoneout cross validation was used to show that the trained network performs well on new data from motions that were used for training. Future research will focus on generalizing applying the trained network to data from different IMUs with different sampling rates and different error characteristics. This will answer the question whether a sufficiently trained neural network can be used as a competitive solution in new sensor and environment settings without the need for collecting and using new training data.
Funding
This work was partly funded by the German Federal Ministry of Education and Research (BMBF, Funding number: 16EMO0262).
References
 [1] J. Beuchert, F. Solowjow, S. Trimpe, and T. Seel, “Overcoming Bandwidth Limitations in Wireless Sensor Networks by Exploitation of Cyclic Signal Patterns: An Eventtriggered Learning Approach,” Sensors, vol. 20, no. 1, p. 260, Jan. 2020, number: 1 Publisher: Multidisciplinary Digital Publishing Institute. [Online]. Available: https://www.mdpi.com/14248220/20/1/260
 [2] M. Caruso, A. M. Sabatini, M. Knaflitz, M. Gazzoni, U. D. Croce, and A. Cereatti, “Accuracy of the Orientation Estimate Obtained Using Four Sensor Fusion Filters Applied to Recordings of MagnetoInertial Sensors Moving at Three Rotation Rates,” in 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Berlin, Germany: IEEE, Jul. 2019, pp. 2053–2058. [Online]. Available: https://ieeexplore.ieee.org/document/8857655/
 [3] L. Ricci, F. Taffoni, and D. Formica, “On the Orientation Error of IMU: Investigating Static and Dynamic Accuracy Targeting Human Motion,” PLoS ONE, vol. 11, no. 9, Sep. 2016. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5017605/
 [4] T. Seel and S. Ruppin, “Eliminating the Effect of Magnetic Disturbances on the Inclination Estimates of Inertial Sensors,” IFACPapersOnLine, vol. 50, no. 1, pp. 8798–8803, Jul. 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S2405896317321201
 [5] M. Caruso, D. Laidig, A. M. Sabatini, M. Knauflitz, U. D. Croce, T. Seel, and A. Cereatti, “Comparison of the performances of six magnetoinertial sensor fusion filters for orientation estimation in movement analysis,” in submitted to the 7th Congress of the National Group of Bioengineering (GNB), 2020.
 [6] M. Brossard, A. Barrau, and S. Bonnabel, “RINSW: Robust Inertial Navigation System on Wheels,” arXiv:1903.02210 [cs], Feb. 2020, arXiv: 1903.02210. [Online]. Available: http://arxiv.org/abs/1903.02210
 [7] K.W. Chiang, H.W. Chang, C.Y. Li, and Y.W. Huang, “An Artificial Neural Network Embedded Position and Orientation Determination Algorithm for Low Cost MEMS INS/GPS Integrated Sensors,” Sensors, vol. 9, no. 4, pp. 2586–2610, Apr. 2009, number: 4 Publisher: Molecular Diversity Preservation International. [Online]. Available: https://www.mdpi.com/14248220/9/4/2586

[8]
J. R. Rambach, A. Tewari, A. Pagani, and D. Stricker, “Learning to Fuse: A Deep Learning Approach to VisualInertial Camera Pose Estimation,” in
2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). Merida, Yucatan, Mexico: IEEE, Sep. 2016, pp. 71–76. [Online]. Available: http://ieeexplore.ieee.org/document/7781768/  [9] M. Brossard, S. Bonnabel, and A. Barrau, “Denoising IMU Gyroscopes with Deep Learning for OpenLoop Attitude Estimation,” arXiv:2002.10718 [cs, stat], Feb. 2020, arXiv: 2002.10718. [Online]. Available: http://arxiv.org/abs/2002.10718
 [10] M. A. Esfahani, H. Wang, K. Wu, and S. Yuan, “AbolDeepIO: A Novel Deep Inertial Odometry Network for Autonomous Vehicles,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–10, 2019. [Online]. Available: https://ieeexplore.ieee.org/document/8693766/
 [11] ——, “OriNet: Robust 3D Orientation Estimation With a Single Particular IMU,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 399–406, Apr. 2020. [Online]. Available: https://ieeexplore.ieee.org/document/8931590/
 [12] T. Zimmermann, B. Taetz, and G. Bleser, “IMUtoSegment Assignment and Orientation Alignment for the Lower Body Using Deep Learning,” Sensors, vol. 18, no. 1, p. 302, Jan. 2018, number: 1 Publisher: Multidisciplinary Digital Publishing Institute. [Online]. Available: https://www.mdpi.com/14248220/18/1/302
 [13] C. Chen, X. Lu, J. Wahlstrom, A. Markham, and N. Trigoni, “Deep Neural Network Based Inertial Odometry Using Lowcost Inertial Measurement Units,” IEEE Transactions on Mobile Computing, pp. 1–1, 2019. [Online]. Available: https://ieeexplore.ieee.org/document/8937008/
 [14] J. Gonzalez and W. Yu, “Nonlinear system modeling using LSTM neural networks,” IFACPapersOnLine, vol. 51, no. 13, pp. 485–489, Jan. 2018. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S2405896318310814
 [15] S. Merity, N. S. Keskar, and R. Socher, “Regularizing and Optimizing LSTM Language Models,” arXiv:1708.02182 [cs], Aug. 2017, arXiv: 1708.02182. [Online]. Available: http://arxiv.org/abs/1708.02182
 [16] C. Andersson, A. H. Ribeiro, K. Tiels, N. Wahlström, and T. B. Schön, “Deep Convolutional Networks in System Identification,” arXiv:1909.01730 [cs, eess, stat], Sep. 2019, arXiv: 1909.01730. [Online]. Available: http://arxiv.org/abs/1909.01730
 [17] A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “WaveNet: A Generative Model for Raw Audio,” arXiv:1609.03499 [cs], Sep. 2016, arXiv: 1609.03499. [Online]. Available: http://arxiv.org/abs/1609.03499
 [18] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv:1502.03167 [cs], Mar. 2015, arXiv: 1502.03167. [Online]. Available: http://arxiv.org/abs/1502.03167
 [19] D. Misra, “Mish: A Self Regularized NonMonotonic Neural Activation Function,” arXiv:1908.08681 [cs, stat], Oct. 2019, arXiv: 1908.08681. [Online]. Available: http://arxiv.org/abs/1908.08681
 [20] C. Tallec and Y. Ollivier, “Unbiasing Truncated Backpropagation Through Time,” arXiv:1705.08209 [cs], May 2017, arXiv: 1705.08209. [Online]. Available: http://arxiv.org/abs/1705.08209
 [21] L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “On the Variance of the Adaptive Learning Rate and Beyond,” arXiv:1908.03265 [cs, stat], Aug. 2019, arXiv: 1908.03265 version: 1. [Online]. Available: http://arxiv.org/abs/1908.03265
 [22] M. R. Zhang, J. Lucas, G. Hinton, and J. Ba, “Lookahead Optimizer: k steps forward, 1 step back,” arXiv:1907.08610 [cs, stat], Jul. 2019, arXiv: 1907.08610 version: 1. [Online]. Available: http://arxiv.org/abs/1907.08610
 [23] J. Howard and S. Gugger, “Fastai: A Layered API for Deep Learning,” Information, vol. 11, no. 2, p. 108, Feb. 2020, number: 2 Publisher: Multidisciplinary Digital Publishing Institute. [Online]. Available: https://www.mdpi.com/20782489/11/2/108
 [24] L. N. Smith, “Cyclical Learning Rates for Training Neural Networks,” arXiv:1506.01186 [cs], Apr. 2017, arXiv: 1506.01186 version: 6. [Online]. Available: http://arxiv.org/abs/1506.01186
 [25] I. Loshchilov and F. Hutter, “SGDR: Stochastic Gradient Descent with Warm Restarts,” arXiv:1608.03983 [cs, math], May 2017, arXiv: 1608.03983. [Online]. Available: http://arxiv.org/abs/1608.03983
 [26] M. Jaderberg, V. Dalibard, S. Osindero, W. M. Czarnecki, J. Donahue, A. Razavi, O. Vinyals, T. Green, I. Dunning, K. Simonyan, C. Fernando, and K. Kavukcuoglu, “Population Based Training of Neural Networks,” arXiv:1711.09846 [cs], Nov. 2017, arXiv: 1711.09846. [Online]. Available: http://arxiv.org/abs/1711.09846

[27]
Z.H. Feng, J. Kittler, M. Awais, P. Huber, and X.J. Wu,
“Wing Loss for Robust Facial Landmark
Localisation with Convolutional Neural Networks,” in
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
. Salt Lake City, UT, USA: IEEE, Jun. 2018, pp. 2235–2245. [Online]. Available: https://ieeexplore.ieee.org/document/8578336/  [28] L. Perez and J. Wang, “The Effectiveness of Data Augmentation in Image Classification using Deep Learning,” arXiv:1712.04621 [cs], Dec. 2017, arXiv: 1712.04621. [Online]. Available: http://arxiv.org/abs/1712.04621
 [29] Xiaodong Cui, V. Goel, and B. Kingsbury, “Data Augmentation for Deep Neural Network Acoustic Modeling,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 9, pp. 1469–1477, Sep. 2015. [Online]. Available: http://ieeexplore.ieee.org/document/7113823/
 [30] Y. Zheng, Q. Liu, E. Chen, Y. Ge, and J. L. Zhao, “Time Series Classification Using MultiChannels Deep Convolutional Neural Networks,” in WebAge Information Management. Cham: Springer International Publishing, 2014, vol. 8485, pp. 298–310, series Title: Lecture Notes in Computer Science.
Comments
There are no comments yet.