Traditionally, the control of a graphical user interface of a computer or the actions of a robot or drone is being done with hand or arm gestures interacting with a physical controller, like a mouse in case of traditional 2D screens. A touch sensor in case of touch screens can also be thought of as a physical controller. The wearable devices can keep the possibility to build a Human-Computer Interface (HCI), which gives an universal, natural and easy to use interaction with machines. With the advent of wearables there is a opportunity to get rid of the physical controller and interact with the computer without a proxy.
Sensing hand gestures without a physical proxy can be done by means of wearables or by means of image or video analysis of hand or finger motion . A wearable-based detection can physically rely on measuring the acceleration and rotations of our bodily parts (arms, hands or fingers) with Inertial Measurement Unit (IMU) sensor(s) or by measuring the myo-electric signals generated by the various muscles of our arms or fingers with EMG sensors. Surface EMG (sEMG) records muscle activity from the surface of the skin which is above the muscle being evaluated. The signal is collected via surface electrodes. Both type of sensors have their own application areas:
IMU sensors are typically used for detecting large movements and they are not suitable for recognizing fine gestures such as spread fingers or finger pinching,
EMG devices are typically used for gesture recognition.
Usually, both types of sensor data are needed to have good user experience from a HCI point of view.
In this paper we focus on the main challenges of sEMG-based gesture recognition. In fact, this translates to a time series classification task and several papers provide solutions from classic data science solutions to deep learning classifications  and it is an active research topic. sEMG signals highly depend on:
The subject under test,
Physical conditions of the subject (e.g., skin conductivity),
External/measurement conditions (e.g., sensor placement accuracy).
If these dependencies are not taken into consideration, like in a scenario when gestures are recognized for one subject in one session without the device being removed from the surface of the skin, the accuracy of state of the art classifiers is above 90%. If any or all these conditions are not met, the accuracy of gesture recognition accuracy degrades to below 50%. In this paper we propose a domain adaptation model that can handle them efficiently.
This paper is organized as follows, Section II provides a short summary of the used technologies,then our adaptation model introduced in Section III. Next, we validate our approach using publicly available sEMG data sets: the experimental setup is described in Section IV, and Section V gives the detailed analysis of the experimentation. Finally, we conclude and summarize our results.
Ii Related Work
In this section, the used techniques and technologies are introduced:
Main properties of sEMG signals and sensors,
sEMG-based gesture recognition techniques,
Recurrent Neural Networks (RNN).
Ii-a Gestures and sEMG
The formation of a hand gesture usually adheres to the following pattern. In the onset stage the hand and/or fingers start to execute a motion from a relaxed position until the point they reach their final state because of a physical constraint, and then in the gesture termination stage they go back to a relaxed position. In a sequence of gestures, like in the case of sign languages, the relaxed position is not reached for a long time. Therefore most hand gestures may be considered mixed from the muscle contraction perspective.
From the perspective of contraction pattern, hand gestures can determine muscles to be contracted in an isotonic, isometric or mixed pattern. Isotonic contractions involve muscular contractions against resistance in which the length of the muscle changes. Contrary to isotonic contractions, isometric contractions create no change in muscle length but tension and energy are fluctuating. An isometric contraction is typically performed against an immovable object .
Furthermore, a sequence of gestures is always a sequence of one or two isotonic contractions followed by exactly one isometric contraction. This is because of the following: during the time interval between relaxed and final states the contraction type is isotonic, because the length of the muscle changes. During the time period where the final state of the gesture is maintained, the contraction type is isometric: tension and energy may fluctuate, but the length of the muscle stays stationary. Finally, in the period when the gesture is terminated and a new gesture follows in the sequence the contraction type is isotonic: the hand/finger is released and/or immediately afterwards contracted again to form the new gesture.
Muscles generate electric voltage during contraction/detraction. EMG detectors measure this signal through electrodes that are attached to the skin. A digital-analogue conversion is performed with a sampling rate of 100 up to 2000 Hz and the outcome is usually normalized into a range of [-1.0, 1.0]. The typical bandwidth of this signal is 5-450 Hz . This set of time series (one per each pair of electrodes) represents usually the input for gesture detection algorithms.
In the number of sensors point of view, two different types of measurement configuration are in use:
sparse EMG: Only a couple of sensors are attached to the skin. Typically, 8- 10 sensors are used in this configuration.
dense EMG: Tens of sensors are attached to the skin. Usually, these sensors are arranged in a matrix and they cover an area of the skin. If the number of sensors is more than 100, the configuration is called high density EMG.
Both configurations have pros and cons. On the one hand, sparse EMG need smaller bandwidth as it has fewer channels and less data to transfer, on the other hand, it is more sensitive to the sensor placement. Otherwise, the dense EMG setups are less sensitive to the sensor placement, but they need more bandwidth and it has wiring issues in case of wearable devices. There are several publicly available EMG datasets, some of them were recorded with parse sensor configuration, while others using dense setup. , ,  and .
Ii-B sEMG-based gesture detection
During a (recording/testing) session, there are several repetitions/trials of the same gesture set by the human subjects. Meanwhile, the sEMG electrode sensors expected to remain in the same placement. Multiple sessions naturally differ by sensor placement because of their shift on the skin and rotation around the arm. Apparently, the sensor placement and electrode skin contact on different subjects has the highest alternation. In sEMG-based gesture recognition there are three cases in terms of the data variability (as shown on Fig. 1):
Intra-session: in this case the data variabilty comes from differences between the trials/repetitions of the performed gestures by the human subject.
Inter-session: in this case there is still the intra-session variability with an additional data variabilty which comes from the differences between the recording sessions. At each recording session the sensor placement can have some shift and/or rotations.
Intra-subject: The electromyogram signal is a kind of biological signal which is severely affected by the difference between subjects. In this case the data variabilty comes from the differences of human subjects.
The intra-session gesture recognition have been extensively researched. Existing sEMG-based solutions utilizes time domain, frequency domain, and time-frequency domain features. Many researchers focused on presenting new sEMG features based on their domain knowledge or analyzing existing features to propose new feature sets. Traditional machine learning classifiers have been employed to recognize sEMG-based gestures, such as k-Nearest Neighbor (kNN), Linear Discriminate Analysis (LDA) 
, Hidden Markov Model (HMM)
, and Support Vector Machine (SVM)
. The Convolutional Neural Network (CNN) architecture is the most widely used deep learning technique for sEMG-based gesture recognition. provided a novel CNN model to extract spatial information from the instantaneous sEMG images and achieved state-of-the-art performance.  applies a novel hybrid CNN-RNN architecture with superior results in the intra-session scenario.
The inter-session and inter-subject variability causes domain shift in the distributions of the sEMG sensor data. From a machine learning viewpoint, one of the key issues in inter-session/subject Muscle-Computer Interfaces (MCIs) is domain adaptation, i.e., developing learning algorithms in which the training data (source domain) used to learn a model have a different distribution compared with the data (target domain) to which the model is applied . Domain adaptation has gained increasing interest in the context of deep learning. When only a small amount of labeled data is available in the target domain during the training phase, fine-tuning pre-trained networks has become the de facto method.
Another approach is the unsupervised adaptive learning which utilises only unlabellet target data.  compares four concepts which work with SVM and provides state-of-the-art results on the NinaPro dataset. 
provides the state-of-the-art solution on the CapgMyo dataset. They invented a multi-source adaptive batch normalization technique which works with CNN architecture. The drawback of this solution, that in case of multiple sources (i.e., multiple subjects), constraints and considerations are needed per source at pre-training time of that model.
Ii-C Recurrent Neural Networks
With the increase of computational capabilities in the recent years, neural networks have become more popular due to their ability to tackle complex data science problems. A typical neural network has an input layer, one or many hidden layers and an output layer. Each hidden layer has a set of nodes that take in weighted inputs from the previous layer and provide an output through an activation function to the next layer. Recurrent neural networks (RNNs) are a family of neural networks in which there are feedback loops in the system. Feedback loops allow processing the previous output with the current input, thus making the network stateful, being influenced by or “remembering” the earlier inputs in each step (see Fig.2). A hidden layer that has feedback loops is also called a recurrent layer. The mathematical representation of a simple recurrent layer can be seen in Eq. (1).
However, regular RNNs suffer from the vanishing gradient problem which means that the gradient of the loss function decays exponentially with time, making it difficult to learn long-term temporal dependencies in the input data.Long Short Term Memory (LSTM) networks had been proposed to solve this problem. They are a special type of RNN that attempt to solve the vanishing gradient problem . In this paper we will present a solution that is utilizing LSTM cells.
LSTM units contain a set of gates that are used to control the stages when information enters the memory (input gate: ), when it’s output (output gate: ) and when it’s forgotten (forget gate: ) as seen in Eq. (2). This architecture allows the neural network to learn longer-term dependencies and they are widely used to analyze time-series data.  In Fig. 3 yellow rectangles represent a neural network layer, circles are point-wise operations and arrows denote the flow of data.
Iii 2-Stage Domain Adaptation
We propose a model which consists of two components as can be seen at Fig. 5 and we name it as 2-Stage RNN (2SRNN):
The domain adaptation layer: which is a single fully-connected layer without a non-linear activation function. The input vector is the same dimension as the output vector where is the number of input features. The trainable weights form a square matrix
plus there is a bias vector.
The sequence classifier: which is a deep stacked RNN with many-to-one setup followed by a -way fully-connected layer and a softmax classifier. is the number of gestures to be recognized.
The linear transformation for domain adaptation with the sameand is applied to the input of the RNN at each timestamp : .
The transformation of the input values (to solve the domain shift) is approximated with perceptron learning. Our assumption is that this transformation is a linear one. Apparently, a linear transformation yields the highest gain. Also, it could be a more complex (polinomial or non-linear) one. There is still gain as long as the domain adaptation layer is smaller (in size and complexity) than the sequence classifier component.
Fig. 4 visualises our method with two consecutive stages:
Iii-B1 Pre-training stage
In the first stage, the weights of the domain adaptation layer are frozen and the sequence classifier is trained from scratch on the source dataset. The domain adaptation layer’s initial weights could be several combinations of real numbers but we chose
to be the identity matrix and
to be a vector of zeros to represent the identity transformation. We apply supervised learning. The optimization is a gradient descent with backpropagation. The loss function is the categorical cross entropy:
where is the number of gestures and I is the indicator function whether class label is the correct classification for the given observation and
is the predicted probability that the observation is of class.
Iii-B2 Domain adaptation stage
In the second stage, the weights of the sequence classifier are frozen in their pre-trained state and the domain adaptation layer’s weights are trained on the target dataset. In this stage, the same supervised learning is applied. The loss (Eq. (3)) is backpropagated to the domain adaptation layer during the process. The advantages of this architecture:
The training during the second stage is very fast because there is only a shallow network to tackle with.
Training a linear layer ensures the convergence.
Iv Experimental setup
We approximate the inter-session and inter-subject shift in the values of the sEMG electrodes with a linear transformation of the input. The transformation could be polinomial or non-linear also but that could result in either a larger domain adaptation network or a non-convex optimization manifold. We let the model discover the coefficients of this linear transformation on the target data during the domain adaptation process.
We have tested the approach on HD sEMG and sparse sEMG datasets:
CapgMyo dataset : includes HD-sEMG data for 128 channels acquired from 23 intact subjects. The sampling rate is 1 KHz. It consists of 3 sub-databases:
DB-a: 8 isometric and isotonic hand gestures were obtained from 18 of the 23 subjects.
DB-b: 8 isometric and isotonic hand gestures from 10 of the 23 subjects in two recording sessions on different days.
DB-c: 12 basic movements of the fingers were obtained from 10 of the 23 subjects.
We downloaded the pre-processed version from http://zju-capg.org/myo/data to use the same data as  to be able to compare our results with theirs. In that version, the power-line interference was removed from the sEMG signals by using a band-stop filter (45–55 Hz, second-order Butterworth). Only the static part of the movements was kept in it (for each trial, the middle one-second window, 1000 frames of data). They used the middle one second data to ensure that no transition movements are included in it. We rescaled the data to have zero mean and unit variance, then we rectified it and applied smoothing.
NinaPro dataset :
DB1: The NinaPro sub-database 1 (DB-1) is for the development of hand prostheses, and contains sparse multi-channel sEMG recordings. It consists of a total of 52 gestures performed by 27 intact subjects.
Gesture numbers 1–12: 12 basic movements of the fingers (flexions and extensions). These are equivalent to gestures in CapgMyo DB-c.
Gesture numbers 13–20: 8 isometric, isotonic hand configurations (”hand postures”). These are equivalent to gestures in CapgMyo DB-a and DB-b.
The data is recorded at a sampling rate of 100 Hz, using 10 sparsely located electrodes placed on subjects’ upper forearms. The sEMG signals were rectified and smoothed by the acquisition device. We downloaded the re-organized version from http://zju-capg.org/myo/data/ninapro-db1.zip to use the same data as  for fair comparison. For each trial, we used the middle 1.5-second window, 180 frames of data to get the static part of the movements. We used the middle 1.5-second data with the aim that no transition movements are included in it.
We decompose the sEMG signals into small sequences using the sliding window strategy with overlapped windowing scheme. The sequence length must be shorter than 300ms  to satisfy real-time usage constraints. To compare our proposed method with previous works, we follow the segmentation strategy in previous studies.
Based on the classification run on the test dataset, taken from the same database as the training dataset in a manner detailed in the subsequent, the classification accuracy is calculated for each database as given below:
V-a Intra-session validation
We first look at intra-session validation to benchmark our sequence classifier against the state-of-the-art in the least challenging scenario, on top of distinct datasets, without performing any further optimization.
In case of CapgMyo dataset we used the same evaluation procedure that was used in the previous study . For each subject, a classifier was trained by using 50% of the data (E.g., trials 1, 3, 5, 7 and 9 for that subject) and tested by using the remaining half. This procedure was performed on each sub-database. For DB-b, the second session of each subject was used for the evaluation.
We chose 150-ms sequence length for the RNN in all the cases for fair comparison. Table I shows our average intra-session recognition accuracy together with the state-of-the-art. Columns noted with DB-a, DB-b, DB-c belong to the CapgMyo dataset and columns noted with DB-1 12 gestures and DB-1 8 gestures belong to the NinaPro dataset.
|DB-a||DB-b||DB-c||DB-1 12 gestures||DB-1 8 gestures|
’-’ notes that the authors of that method did not focus on the scenario.
As can be seen from Table I
the accuracy achieved by our model is at most 2.4 percentage points worse than other methods for the CapgMyo database and with 0.7 up to 7.7 percentage points better for the NinaPro dataset. This accuracy has been achieved by keeping the training duration to constant 100 epochs and without any hyper-parameter tuning. This outcome indicates that that our model is at least comparable in the intra-session case with other approaches.
V-B Inter-session validation
We evaluated inter-session recognition for CapgMyo DB-b, in which the model was trained using data recorded from the first session and evaluated using data recorded from the second session. In each case, we ran our domain adaptation for 100 epochs using the following 3 scenarios:
Scenario 1: domain adaptation is not applied,
Scenario 2: domain adaptation performed on the complete set of target data (all data of the target session); this scenario has only been considered for the purpose of comparability with alternative approaches,
Scenario 3: domain adaptation performed on 50% of the trials of the target session, while the validation set is the remaining 50%.
Our adaptation scheme enhanced inter-session recognition with 34 percentage points (accuracy of 78.3% compared to 44.3%) which is a 77% improvement (shown in Table II).
|Scenario 1||Scenario 2||Scenario 3|
V-C Inter-subject validation
In this experiment, we evaluated inter-subject recognition of 8 gestures using the second recording session of CapgMyo DB-b and the recognition of 12 gestures using CapgMyo DB-c and the sub-set of 12 gestures from the NinaPro DB-1. We performed a leave-one-out cross-validation, in which each of the subjects was used in turn as the test subject and a classifier was trained using the data of the remaining subjects, using the following 3 scenarios:
Scenario 1: domain adaptation is not applied,
Scenario 2: domain adaptation performed on the complete set of target data (all data of the target subject); this scenario has only been considered for the purpose of comparability with alternative approaches,
Scenario 3: domain adaptation performed on 50% of the trials of the target subject, while the validation set is the remaining 50%, with the following 2 variants:
CapgMyo DB-b, DB-c: 50%-50% of the target subject data (5 of the 10 trials are used for domain adaptation and another 5 is for its validation).
NinaPro DB-1 12 gestures: 50%-50% of the target subject data (5 of the 10 trials are used for domain adaptation and another 5 is for its validation).
In case of the CapgMyo DB-b and DB-c we ran our domain adaptation for 100 epochs, and in case of the Ninapro DB-1 for 400 epochs. The sequence length of our RNN was 150 ms in case of the CapgMyo DB-b and DB-c for comparison reasons with , and 400 ms in case of the Ninapro DB-1 for comparison reasons with . Table III shows the classification accuracies of the various methods.
|Scenario 1||Scenario 2||Scenario 3|
Our adaptation scheme enhanced inter-subject recognition with a 72% improvement on DB-b, 147% improvement on DB-c and 91% improvement on DB-1 12 gestures (shown in Table III).
We summarise the domain adaptation improvement results in Table IV. As indicated there, the performance of 2SRNN is superior in all cases: the improvement obtained from our domain adaptation in the inter-session and inter-subject cases exceeds those obtained through alternative domain adaptation approaches.
|Inter-session improvement||Inter-subject improvement|
It is natural to ask how much data is required to obtain a stable recognition accuracy and how our solutions relates to the common supervised fine-tuning method in deep learning. Fig. 6 visualises a comparison of the inter-subject domain adaptation scenario (on the CapgMyo DB-b) based on our 2-stage RNN method and an adaptation based on supervised fine-tuning in one concrete scenario.
In this experiment we limited the available data to 20%, 40%, 60%, 80% and 100% of the total 5 trials used for domain adaptation (the remaining 5 trials are kept for validation). The mean classification accuracy is plotted as a function of the available target data for domain adaptation. Fig. 6 shows how the accuracy of the two method increases with the amount of available target data and our 2SRNN remains persistently superior to the fine tuning method (by 20%). In each case we ran the domain adaptations for 5 epochs only since it is expected to get improvements quickly for better human-computer interactions. On our server (with 2 Nvidia Titan V GPUs) these 5 training epochs took approximately 7.5 seconds for our 2SRNN and 27.7 seconds for supervised fine-tuning, respectively. Therefore a 20% improvement in accuracy is complemented with a decrease in execution time by almost a factor of 4.
For real Human-Computer Interactions the sEMG-based gesture detection must overcome the inter-session and inter-subject domain shifts. We proposed a 2-stage domain adaptation solution which has superior performance over the well-known supervised fine-tuning applied in deep learning and the state-of-the-art unsupervised adaptation methods. Empirical results validate that the approximated transformation of the input values to solve the domain shift is a linear one. It is fast and light weight and applicable to any machine learning approaches which are trainable with backpropagations.
The codes are available at https://github.com/ketyi/2SRNN.
-  H. Cheng, L. Yang, and Z. Liu, “Survey on 3D Hand Gesture Recognition”, IEEE Trans. Circuits and Systems for Video Technology, Vol. 26, No. 9, September 2016, pp. 1659–1673.
-  J. Li, T. Ma, X. Zhou, Y. Liu, S. Cheng, C. Ye, and Y. Wang, “A Real-time Human Motion Recognition System Using Topic Model and SVM”, 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), 2017, pp. 173–176.
-  S. Ranasinghe, F. Machot, and H. Mayr, “A review on applications of activity recognition systems with regard to performance and evaluation”, Intl. Journal of Distributed Sensor Networks, 2016, Vol. 12(8).
-  V. Patel, R. Gopalan, R. Li, and R. Chellappa, “Visual Domain Adaptation: An Overview of Recent Advances”, IEEE Signal Processing Magazine, May 2015, Vol. 32 (3), pp. 53–69.
-  V. Gregori, A. Gijsberts, and B. Caputo, “Adaptive Learning to Speed-Up Control of Prosthetic Hands: a Few Things Everybody Should Know”, 2017 International Conference on Rehabilitation Robotics (ICORR), QEII Centre, London, UK, July 17-20, 2017, pp. 1130–1137.
-  G. Andrew, R. Arora, J. Bilmes, and K. Livescu, “Deep Canonical Correlation Analysis”, 30th Intl Conference on Machine Learning, Atlanta, Georgia, USA, 2013.
N. Patricia, T. Tommasi, and B. Caputo, “Multi-Source Adaptive Learning for Fast Control of Prosthetics Hand”, Intl Conference on Pattern Recognition, Stockholm, Sweden, 2014.
-  EMG dataset in lower limb, http://archive.ics.uci.edu/ml/datasets/emg+dataset+in+lower+limb, retrieved on 27.12.2018.
-  R. N. Khushaba, M. Takruri, S. Kodagoda, and G. Dissanayake, “Toward Improved Control of Prosthetic Fingers Using Surface Electromyogram (EMG) Signals”, Expert Systems with Applications, vol 39, no. 12, pp. 10731–10738, 2012.
-  Mimetic Interfaces: Facial Surface EMG Dataset 2015: https://tutcris.tut.fi/portal/en/datasets/mimetic-interfaces-facial-surface-emg-dataset-2015(8a21105e-4eca-4531-b021-a62509711ee0).html.
-  C. Sapsanis, G. Georgoulas, A. Tzes, D. Lymberopoulos, “Improving EMG based classification of basic hand movements using EMD in 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society 13 (EMBC 13), July 3-7, pp. 5754 - 5757, 2013.”.
-  CapgMyo: A High Density Surface Electromyography Database for Gesture Recognition: http://zju-capg.org/myo/data, accessed on 27.12.2018.
-  S. Pizzolato et al., “Comparison of six electromyography acquisition setups on hand movement classification tasks,” PloS one, vol. 12, no. 10, p. e0186132, 2017.
-  D. Rempel, M. Camilleri, and D. Lee, “The Design of Hand Gestures for Human-Computer Interaction: Lessons from Sign Language Interpreters”, Int J Hum Comput Stud. 2015 Oct; 72(10-11): 728–735.
-  J. Garcia, M. Cannito, and P. Dagenais, Hand Gestures: Perspectives andPreliminary Implications for AdultsWith Acquired Dysarthria, American Journal of Speech-Language Pathology, Vol. 9, pp. 107–115, May 2000.
-  U. Cote-Allard, C. Fall, A. Drouin, A. Campeau-Lecours, C. Gosselin, K. Glette, F. Laviolette, and B. Gosselin, “Deep Learning for Electromyographic Hand Gesture Signal Classification Using Transfer Learning”, arXiv:1801.07756v4 [cs.LG] 19 Nov 2018.
-  N. Nazmi, M. Rahman, S. Yamamoto, S. Ahmad, H. Zamzuri, and S. Mazlan, “A Review of Classification Techniques of EMG Signals during Isotonic and Isometric Contractions”, Sensors (Basel). 2016 Aug; 16(8): 1304.
-  Y. Du, W. Jin, W. Wei, Y. Hu, and W. Geng, “Surface EMG-Based Inter-Session Gesture Recognition Enhanced by Deep Domain Adaptation”, Sensors 2017, 17, 458; doi:10.3390/s17030458.
-  Y. Hu, Y. Wong, W. Wei, Y. Du, M. Kankanhalli, W. Geng, “A novel attention-based hybrid CNN-RNN architecture for sEMG-based gesture recognition”, PLoS One, 2018;13(10):e0206049, Published 2018 Oct 30, doi:10.1371/journal.pone.0206049.
-  Kim J, Mastnik S, André E. EMG-based hand gesture recognition for realtime biosignal interfacing. In: International Conference on Intelligent User Interfaces; 2008. p. 30–39.
-  Menon R, Caterina GD, Lakany H, Petropoulakis L, Conway B, Soraghan J. Study on interaction between temporal and spatial information in classification of EMG signals in myoelectric prostheses. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2017;PP(99):1–1.
-  Yun LK, Swee TT, Anuar R, Yahya Z, Yahya A, Kadir MRA. Sign Language Recognition System using SEMG and Hidden Markov Model. In: International Conference on Mathematical Methods, Computational Techniques and Intelligent Systems; 2013. p. 50–53.
-  Atzori M., Gijsberts A., Castellini C., Caputo B., Hager A.G.M., Elsig S., Giatsidis G., Bassetto F., Müller H. Electromyography data for non-invasive naturally-controlled robotic hand prostheses. Sci. Data. 2014;1:140053. doi: 10.1038/sdata.2014.53.
-  P. D. Kingma, J. Ba, Adam: A Method for Stochastic Optimization, ICLR, 2014
-  P. Konrad: The ABC of EMG: a practical introduction to kinesiological electromyography, ISBN: 0977162214, 2006
-  Christopher Olah: Understanding LSTM Networks (retr. on 11.01.2018) http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
-  Razvan Pascanu et al.: On the difficulty of training recurrent neural networks. 2013.
-  Hochreiter and J. Schmidhuber: Long short-term memory. Neural Computation, 1997.
-  Chung et al.: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. (2014).