I Introduction
As the leading risk factor of cardiovascular diseases (CVD) [1], high blood pressure (BP) has been commonly used as the critical criterion for diagnosing and preventing CVD. Therefore, accurate and continuous BP monitoring during people’s daily life is imperative for early detection and intervention of CVD. Traditional BP measurement devices, e.g., Omron products, are cuffbased and therefore bulky, discomfort to use, and only available for snapshot measurements. These disadvantages restrict the use of the cuffbased devices for longterm and continuous BP measurement, which are essential for nighttime monitoring and precise diagnosis of different CVD symptoms.
A key feature of our cardiovascular system is its complex dynamic selfregulation that involves multiple feedback control loops in response to BP variation[2]. This mechanism gives the BP dynamics a temporal dependency nature. Accordingly, such dependency is critical for continuous BP prediction and in particular, for longterm BP prediction.
Existing methods for cuffless and continuous BP estimation can be categorized into two groups, namely physiological model, i.e., pulse transit time model[3] [4]
, and regression model, such as decision tree, support vector regression and etc
[5][6]. These models suffers from accuracy decay over time, especially for multiday continuous BP prediction. Such limitation has become the bottleneck that prevents the use of these models in practical applications. It is worth noting that the aforementioned models directly map present input to the target while ignoring the important temporal dependencies in BP dynamics. This could be the root of longterm inaccuracy.Compared with static BP prediction, the multiday BP prediction is generally much more challenging. Due to the complex regulation mechanisms of human body, multiday BP dynamics have more intricate temporal dependencies and a larger variation range. In this paper, we formulate the BP prediction as a sequence learning problem, and propose a novel deep RNN model, which is proved to be very effective for modeling longrange dependencies in BP dynamics and has achieved the stateoftheart accuracy on multiday continuous BP prediction.
Ii The Model
The goal of arterial BP prediction is to use multiple temporal physiological signals to predict BP sequence. Let
be the input features extracted from electrocardiography (ECG) and photoplethysmogram (PPG) signals, and
denote the target BP sequence. The conditional probability
is factorized as:(1) 
where can be interpreted as hidden state of BP dynamic system and it is generated from previous hidden state and current input as:
(2) 
Figure 1
illustrates the overview of our proposed deep RNN model. The deep RNN consists of a bidirectional LSTM at the bottom layer, and a stack of multilayered Long ShortTerm Memory (LSTM) with residual connections. The full network was trained with backpropagation through time
[7] to miniaturize the difference between BP prediction and the ground truth.Iia Bidirectional LSTM Structure
First, we introduce the basic block of our deep RNN model, a onelayer bidirectional Long shortterm memory (LSTM). LSTM [8]
was designed to address the vanishing gradient problem of conventional RNN by introducing a memory cell state
and multiple gating mechanisms inside a standard RNN hidden state transition process. The hidden state in LSTM is generated by:(3)  
(4)  
(5)  
(6)  
(7) 
where , and are respectively the forget gate, input gate, output gate that control how much information will be forgotten, accumulated, or be outputted. and
terms denote weight matrices and bias vectors respectively.
andstand for an elementwise application of the logistic sigmoid function and hyperbolic tangent function respectively, and
denote elementwise multiplication.Conventional LSTMs use to capture information from the past history , and the present input . To access largerscale temporal context of input sequence, one can also incorporate nearby future information to inform the downstream modeling process. Bidirectional RNN (BRNN) [9] can realize this function by processing the data in both forward and backward directions with two separate hidden layers, which then merge to the same output layer. As illustrated in the bottom of Figure1, a BRNN computes a forward hidden state , a backward hidden state and final output by following equations:
(8)  
(9)  
(10) 
IiB Multilayered Architecture with Residual Connections
A variety of experimental results [10][11] have suggested that RNNs with deep architecture can significantly outperform shallow RNNs. Simply by stacking multiple layers of RNN could readily gain expressive power. However, a full deep network could become difficult to train as it goes deeper, likely due to exploding and vanishing gradient problems [12].
Inspired by the idea of attaching an identity skip connection between adjacent layers, which has shown good performance for training deep neural networks[13][14][15], we incorporate a residual connection from one LSTM layer to the next in our model, as shown in Figure 2.
Let , , and be the input, hidden state and LSTM function respectively associated with the th LSTM layer ( ), and is the corresponding weight of . The input to the th LSTM layer is elementwise added to this layer’s hidden state . This sum is then fed to the next LSTM layer. The LSTM block with residual connections can be implemented by:
(11)  
(12)  
(13) 
The deep RNN model can be created by stacking multiple such LSTM blocks on top of each other, with the output of previous block forming the input of the next. Once the toplayer hidden state is computed, the output can be obtained by:
(14) 
IiC Multitask Training
Given that we have multiple supervision signals like systolic BP (SBP), diastolic BP (DBP) and mean BP (MBP) which are closely related to each other, we adopt multitask training strategy to train one single model to predict SBP, DBP and MBP in parallel. Accordingly, the training objective is to minimize the mean squared error (MSE) of total training samples as follow:
(15) 
where represents ground truth, is corresponding prediction. And represents the regulation of model parameters and is the corresponding penalty coefficient. One advantage of multitask training is that learning to predict different BP values simultaneously could implicitly encode the quantitative constrains among SBP, DBP and MBP.
Iii Analysis of Deep RNN Architecture
RNNs are inherently deep in time because of their hidden states transition. Despite the depth in time, the proposed Deep RNN model is also deep along layer structure. To simplify the analysis, here we mainly focus on the gradient flow along the depth of layers. Through recursively updating Equation 12, we will have:
(16) 
for any deeper layer and shallower layer . Equation 16
leads to nice backward propagation properties. Denoting the loss function as
, by the chain rule of backpropagation we have:
(17) 
Equation 17 shows that the gradient can be decomposed into two additive terms: a term of that propagates information directly without through any weight layers, and another term that propagates through the weight layers. The first term of ensures that supervised information could directly backpropagate to any shallower layer . In general the term cannot always be for all samples in a minibatch, so the gradient is unlikely to be canceled out. This implies that the gradients of a layer does not vanish even when the intermediate weights are arbitrarily small. This nice backpropagation property allows us to train deep RNN model that owns more expressive power without worrying about the gradient vanishing problem.
Iv Experiments
We evaluate the proposed model on both a static and multiday continuous BP dataset. Root mean square error (RMSE) is used as the evaluation metric, which is defined as
. On both datasets we compare our model with the following reference models:
Typical regression models: support vector regression (SVR), decision tree (DT), and Bayesian linear regression (BLR).
Iva Dataset
Static continuous BP dataset. The dataset, including ECG, PPG and BP were obtained from 84 healthy people including 51 males and 33 females. ECG and PPG signal were acquired with Biopac system and reference continuous BP was measured by Finapres system simultaneously in each experiment. The BP, ECG and PPG data of each subject were recorded at sampling frequency of 1000 Hz for 10 minutes at the rest status.
Multiday continuous BP dataset. Similar dataset was obtained from 12 healthy subjects including 11 males and 1 female. The BP, ECG and PPG data of each subject were recorded for 8 minutes at the rest status in a multiday period, namely 1st day, 2nd day, 4th day and 6 moth after the first day.
IvB Data Representation
Since the primary goal of this paper is to prove the importance of modeling temporal dependencies in BP dynamics for accurate BP prediction, we simply select 7 representative handcrafted features of ECG and PPG signals (shown in Fig 3) as follows:

: time interval from ECG R peak to the same heart cycle PPG maximum slope.

Heart rate:

Reflection index:

Systolic timespan:

Up time:

Systolic volume:

Diastolic volume:
Now input becomes a matrix, and each row of
is normalized to have zeromean and unitvariance. Further model performance gain could be expected by adding more informative features as model inputs.
IvC Implementation Details
All the RNN models were trained using minibatches of size 64 and the Adam optimizer [16]. For each minibatch, we computed the norm of gradients . If , the gradients were scaled by (v is set as 5 by default.). We run our model with different number of layers, with hidden state size as 128 at each layer. The sequence length of each training sample is set to 32, and it could be larger if deeper model is adopted. For saving computational cost, we only adopt bidirectional LSTM at first layer. Due to limited training samples of our BP prediction problem, the maximum depth of deep RNN model was set as 4 to avoid overfitting. Each training dataset was divided such that 70% of the data was used for training, 10% for validation and 20% for test. SBP, DBP and MBP were normalized to by their corresponding maximum, respectively. For evaluation on the multiday continuous BP dataset, all deep RNN models were first pretrained on the static BP dataset then finetuned using part of the firstday data, and finally tested on the rest of the firstday data as well as the following days’ data.
Model  RMSE(SBP)  RMSE(DBP) 

PTTChen [3]  5.31   
PTTPoon [4]  5.75  3.50 
BLR  7.45  6.20 
SVR  6.54  6.28 
DT  4.45  2.80 
Kalman Filter  5.17  3.09 
LSTM  6.31  4.58 
BiLSTM  5.25  3.04 
DeepRNN2L  5.13  3.73 
DeepRNN3L  4.92  3.13 
DeepRNN4L  3.73  2.43 
V Experimental Results
Validation on static continuous BP dataset. As shown in Table I, the PTT models yield slightly better results than BLR and SVR models, but show poorer performance than DT, kalman filter, bidirectional LSTM and deep RNN (DeepRNN) models. The best accuracy was obtained by our 4layer deep RNN (DeepRNN4L) model which achieves a RMSE of 3.73 and 2.43 for SBP and DBP prediction respectively. The BlandAltman plots (Figure 4) indicate that the DeepRNN4L predictions agreed well with the ground truth, with 95% of the differences lie within the agreement area. Figure 6 qualitatively shows the DeepRNN4L prediction result on a representative subject from the static continuous BP dataset.
By incorporating a bidirectional structure in the model, i.e. by using the bidirectional LSTM (BiLSTM), the prediction accuracy is improved significantly as compared to the vanilla LSTM, with 17 % decrease in the SBP RMSE and 34 % decrease in DBP RMSE. Furthermore, it was observed that the improvement of prediction accuracy is enhanced with increasing depth of the DeepRNN network. For instance, replacing DeepRNN2L with DeepRNN4L results in 27% and 35% improvement on SBP and DBP prediction respectively. When we stack up to a 5layer DeepRNN, the model tend to overfit and no clear benefits of depth can be observed any more.
Validation on multiday continuous BP dataset. Figure 5 compares the prediction performance of deep RNN against the reference models. It can be clearly seen that the DeepRNN models yield much better performance as compared to the PTT and regression models, likely due to the temporal dependencies modeling in the DeepRNN models. Kalman filter could model the time dependencies in sequence but dose not perform as well as DeepRNN models. It is likely because of the linearity assumption of kalman filter that both state transition and measurement functions are linear. This assumption may limit its capability to model the complex temporal dependencies in BP dynamics. The best accuracy was obtained by our DeepRNN4L model which achieves a RMSE of 3.84, 5.25, 5.80 and 5.81 mmHg for the 1st day, 2nd day, 4th day and 6th month after the 1st day SBP prediction, and 1.80, 4.78, 5.0, 5.21 mmHg for corresponding DBP prediction, respectively. As shown in Figure 5, all the PTT models, regression models and kalman filter exhibit pronounced accuracy decay from the second day. Although the prediction accuracy of the DeepRNN model also drops after the first day, it consistently provides the lowest RMSE values among all models. Figure 7 qualitatively shows the capability of DeepRNN to track longterm BP variation.
Importance of residual connections. To investigate the importance of residual connections, we conduct ablation study on the static continuous BP dataset. As shown in Table II, DeepRNN model incorporated with residual connections works considerably better than the counterpart. During training, we found residual connections significantly improve the gradient flow in the backward pass which make deep neural network easier to optimize. Accordingly, better performance could be obtained due to more expressive deep structure. The detailed reason for such computational benefit has been explained in section III.
RMSE (SBP)  RMSE (DBP)  

DeepRNN4L w/o residual  5.31  3.13 
DeepRNN4L w residual  3.73  2.43 
RMSE (SBP)  RMSE (DBP)  

DeepRNN2L  6.24  4.55 
DeepRNN3L  5.05  3.30 
DeepRNN4L  4.27  3.02 
DeepRNN2L †  5.13  3.73 
DeepRNN3L †  4.92  3.13 
DeepRNN4L †  3.73  2.43 
Importance of multitask training. Table III shows that multitask training strategy can boost the prediction performance as compared with separate training of individual models. It can be explained by that the different training objectives involved in each task are strongly correlated and thus share a lot of data representations that capture the underlying factors, which can be learned by the same model structure. Hence, by learning the shared representations, it can crucially improve the model generalization ability.
Vi Conclusions
In this work, we demonstrated that modeling the temporal dependency in BP dynamics can significantly improve longterm BP prediction accuracy, which is one of the most challenging problems in cuffless BP estimation. We proposed a novel deep RNN that incorporated with bidirectional LSTM and residual connections to tackle this challenge. The experimental results show that the deep RNN model achieves the stateoftheart accuracy on both static and multiday continuous BP datasets.
References
 [1] S. S. Lim, T. Vos, A. D. Flaxman, G. Danaei, K. Shibuya, H. AdairRohani, M. A. AlMazroa, M. Amann, H. R. Anderson, K. G. Andrews, et al., “A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: a systematic analysis for the global burden of disease study 2010,” The lancet, vol. 380, no. 9859, pp. 2224–2260, 2013.
 [2] A. C. Guyton, T. G. Coleman, A. W. Cowley, K. W. Scheel, R. D. Manning, and R. A. Norman, “Arterial pressure regulation: overriding dominance of the kidneys in longterm regulation and in hypertension,” The American journal of medicine, vol. 52, no. 5, pp. 584–594, 1972.
 [3] W. Chen, T. Kobayashi, S. Ichikawa, Y. Takeuchi, and T. Togawa, “Continuous estimation of systolic blood pressure using the pulse arrival time and intermittent calibration,” Medical and Biological Engineering and Computing, vol. 38, no. 5, pp. 569–574, 2000.
 [4] C. Poon and Y. Zhang, “Cuffless and noninvasive measurements of arterial blood pressure by pulse transit time,” in 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference. IEEE, 2006, pp. 5877–5880.
 [5] F. Miao, N. Fu, Y.T. Zhang, X.R. Ding, X. Hong, Q. He, and Y. Li, “A novel continuous blood pressure estimation approach based on data mining techniques,” IEEE Journal of Biomedical and Health Informatics, 2017.
 [6] M. Jain, N. Kumar, S. Deb, and A. Majumdar, “A sparse regression based approach for cuffless blood pressure measurement,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 789–793.
 [7] P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.
 [8] S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.

[9]
M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,”
IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.  [10] A. Graves, A.r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, 2013, pp. 6645–6649.

[11]
D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case,
J. Casper, B. Catanzaro, Q. Cheng, G. Chen, et al., “Deep speech 2:
Endtoend speech recognition in english and mandarin,” in
International Conference on Machine Learning
, 2016, pp. 173–182.  [12] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks.” ICML (3), vol. 28, pp. 1310–1318, 2013.

[13]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in
Proceedings of the IEEE conference on computer vision and pattern recognition
, 2016, pp. 770–778.  [14] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Highway networks,” arXiv preprint arXiv:1505.00387, 2015.
 [15] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al., “Google’s neural machine translation system: Bridging the gap between human and machine translation,” arXiv preprint arXiv:1609.08144, 2016.
 [16] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
Comments
There are no comments yet.