Internet of Things (IoT) devices are widely used to monitor and send the data to the Cloud for centralized storage and execution of downstream tasks. For example, hospitals use IoT medical devices to constantly monitor Electrocardiogram (ECG) signals, the patient heart’s activity over time. ECG signal analysis can alert the staff of abnormal heart behavior and, in general, is a critical tool for diagnosing heart disease. Typical ECG diagnosis downstream tasks include extraction of features, like R-R peaks [vijayarangan2020rpnet], QRS complexes [hong2020opportunities], and heart rate (HR) arrhythmia detection [rajpurkar2017cardiologist].
Such medical monitoring settings generate large amounts of continuous sensor data to be sent to the Cloud. Transmitting the raw signal would imply power-hungry devices and high processing and storage Cloud costs. Therefore, an effective data compression scheme is required to reduce the transmission and storage requirements. An efficient compression module is typically deployed on the device to accommodate settings with low power resource-limited IoT devices, while a decompression module is deployed in the Cloud. Fig. 1 presents an end-to-end data flow in a Cloud-based monitoring system111Some systems place additional servers on-site to gateway and aggregate multiple sensors. In such settings, compression is applied at the servers.: compressed data segments are prepared for transmission, and the data is decompressed back to raw ECG signal in the Cloud for further processing and analysis.
General ECG compression techniques that fit IoT settings belong to one of two categories: lossless or lossy. Lossless compression methods preserve the signal’s complete information but tend to achieve low compression gain (CG). In contrast, lossy techniques obtain high CG at the expense of losing information, and hence reduce the performance of downstream tasks. For example, we show in Table I that LZ77 [welch1984technique], a common lossless compression, achieves a CG of 2.7 on widely used ECG datasets (see Section VI-A) with data segments of 2K samples each, rendering the scheme inappropriate in many scenarios. On the other hand, utilizing a state-of-the-art (SOTA) lossy compression scheme, based on a convolutional autoencoder (CAE) [yildirim2018efficient] tuned to a high fixed CG of 32, leads to significant performance degradation in signal reconstruction and heart rate classification.
Previous ECG works focus either on compressing schemes optimized for signal reconstruction (and can result in low downstream task performance) or models optimized for extracting or classifying ECG features (and not necessarily lend themselves to effective compression)[hong2020opportunities]. One approach to bridge the gap between the two is to allow practitioners to control the tradeoff between compression level and tasks’ performance. Further, a task-aware design that achieves low reconstruction error may support future downstream tasks through the reconstructed signal.
In this paper, we aim to provide a lossy compression method in which the system administrator (admin) can configure the desired performance level, namely an upper bound on downstream tasks’ error, while maintaining an optimized CG. Moreover, the method needs to be resource efficient to fit in an IoT device without adding significant overhead.
We propose Dynamic Deep (see Fig. 3), a task-aware compression that uses convolutional autoencoders, where the compression level is optimized for each data segment sent to the Cloud. The dynamic behavior allows the scheme to achieve impressive CG without violating the downstream tasks’ performance requirements.
A straightforward approach to extend an existing CAE to support dynamic operation is to choose the optimal level of compression based on the reconstructed signal. Such a solution requires feedback from the downstream tasks and thus is not a fit for an IoT setting where the compression and decompression modules are decoupled. To address this, we extend a classical CAE architecture also to learn to predict the downstream tasks’ feedback as part of the compressed representation. To further reduce the computational costs of our model, we apply standard (deep learning) architecture optimization techniques. We believe that our work may be the first to propose a dynamic compression scheme with a tunable downstream task error that easily matches the design requirements for IoT medical applications.
We conducted a comprehensive evaluation of our proposed method using two complex real-life ECG applications, R-R peaks detection [vijayarangan2020rpnet] and HR arrhythmia classification [rajpurkar2017cardiologist]. Our compression method increases HR arrhythmia classification the latter’s F1-score by 3 times, and the average CG by up to 83% (with a maximum CG of 64), compared to using a SOTA CAE compression with a fixed CG [yildirim2018efficient], as shown in Table I
. These improvements stem from training the compressor with the downstream task loss feedback and applying a dynamic compression approach. Dynamic-Deep successfully preserved the desired error bound for all data segments; Whereas SOTA CAE violates the HR arrhythmia classification task’s error bound for 10.21% of the data segments. Here, the classification task’s error is defined as the categorical cross entropy (CCE) loss, a common loss function for classification models. Further, the memory footprint of our solution is 67% lower than that of the CAE. In addition, we deployed our solution on the Google Cloud Platform to study its real-world Cloud costs. Decompressing data traffic equivalent to that of a small-mid-sized hospital, we observe that Dynamic-Deep reduces Cloud expenses by up to 97% compared to that of non-compressed data traffic.
Our analysis reveals a low correlation between reconstruction error and the (tested) downstream task’s error. The low correlation may be caused by the objectives difference, continuously tracking the signal vs. capturing sporadic events. Hence, we train our model by fine tuning a reconstruction-optimized model with the downstream tasks combined loss. Our method can be easily extended to incorporate additional downstream tasks by adding the corresponding loss functions. Moreover, while we present the results of Dynamic-Deep on medical ECG data, our method is general, and thus can be extended to support other sensor types and potentially other domains by simply adapting the compressor modules 44footnotemark: 4
This paper is organized as follows. Section II discusses related works on ECG signal compression. Sections III-V explain the motivation, the high-level and detailed design of our proposed solution. Next, evaluation experiments are presented in VI. Finally, we conclude in VII.
Ii Related Work
Existing compression techniques were adapted into medical IoT environments, typically to fit the low power requirements [ukil2015iot], [djelouat2018compressive]. Such methods are split into lossless [welch1984technique] and lossy [djelouat2018compressive] categories. Lossless achieves low CG on signals, such as ECG [koski1997lossless], while lossy compression is highly efficient (in reducing storage requirements), and thus fit for IoT sensing.
A common approach for lossy compression is transform-based, which seeks to preserve the crucial parts of the signal’s representation in the transformed domain. Few notable examples are Fourier transform[reddy1986ecg], Wavelet transform [chen1993ecg] or the Cosine transform [ahmed1974discrete], each with its own domain transformation preference. The transformed representation of ECG data is often sparse, and thus preserving the right parts of the representation enables one to get a reconstruction signal of potentially high fidelity [djelouat2018compressive], [cheng2007data]. In [ukil2015iot] a proposal for a dynamic scheme that adopts the threshold on the preserved representation size is suggested. The main drawback of the transform-based approach is the use of a predefined domain transformation, which may not lead to the highest CG for the desired reconstruction level.
Neural network (NN) based compression. NNs automate the process of searching for an optimal domain transformation for the compressed representation. Auto-Encoders (AE), a family of NNs, were extensively studied [hinton2006reducing], [goroshin2013saturating] and shown to effectively learn an expressive-yet-efficient representation of ECG segments, and thus provide a higher CG than transform-based compression methods [ukil2015iot]. Recent work [yildirim2018efficient] uses a convolutional AE with 27 layers to achieve SOTA compression results. In general, AE architectures have a single fixed compression level. We study how to extend such architecture to dynamically adapt across multiple compression levels.
Task-aware compression. Recent NN-based data compression architectures consider the joint performance of signal reconstruction and single downstream task (such as Person keypoint detection [pinkham2020algorithm] or 3D point cloud classification tasks [dovrat2019learning]). They show an increase in downstream tasks’ performance and compression performance compared to isolated compressors optimization. However, none of them introduce multiple downstream tasks’ performance analysis. Dynamic-Deep applies such an approach for the ECG domain and studies how to tune a multiple task-aware compressors to optimize CG results.
Iii Motivation For Balancing CG And Downstream Tasks Performance
Lossy compression methods, such as SOTA CAE [yildirim2018efficient], are based on an architecture with a single fixed compression level. However, using a fixed compression level may not necessarily satisfy a desired bound on task’s error for every data segment. Fig. 3
presents the CCE quartiles across ECG segments for different fixed compression levels: 128, 64, 32, 16 and none (see sectionVI-A for tasks’ and datasets’ details). Denote CAExx, an extension of SOTA CAE implementation to a CG of xx. We use CAE0 to represent no compression. Non zero losses may occur in uncompressed operations due to the inherent error of the HR detection model.
Let an example scenario be when the admin bound the HR arrhythmia classification loss (CCE) to 0.75 (see Fig. 3), none of the fixed compressors can satisfy this bound. For instance, to meet the admin’s defined bound with CAE32, 75% of data segments can be compressed with CG of 32, and the rest 25% handled as uncompressed, yielding an average CG of 24. However, note that CAE64 meets the upper bound for approximately more than 70% of data segments. With 3-compression levels, each data segment can be compressed with the highest CG level that satisfies the admin’s configured upper bound error. So in the above example, around 75% of data segments can be compressed with a high CG of 64 or 32 while the rest 25% remain uncompressed, reaching CG of 48.31 (See section VI).
Balancing CG and the downstream tasks’ error is challenging, as the compressor needs the task’s error feedback for every level of compression to optimize for the highest CG. But, in IoT settings, downstream tasks are decoupled from the compressor and located in the Cloud. We explain how to manage and provide the error feedback in the next section.
Iv Dynamic-Deep High Level Design
To receive feedback from the downstream task, we propose a solution that extends the classical CAE architecture with multiple compression levels in a task-aware fashion (for discussions regarding the number of levels see section LABEL:sec:cl_cg_tune). (discussions regarding the number of levels are in the next section) Dynamic-Deep consists of the following (as shown in Fig. 3):
Compression Module: set of encoders that compress every raw data segment into multiple compression level representations. To reduce the need for actual feedback from the downstream tasks, we employ a dense layer trained to predict the downstream tasks’ weighted error for each compression level. We choose the highest feasible CG based on the pre-configured upper bound error and the feedback prediction.
Decompression Module: set of decoders that accept multiple representations of different compression levels and reconstruct the data segment.
For optimizing the joint performance of downstream and reconstructed tasks, the training phase requires differentiable downstream tasks (e.g. NN-based). Nonetheless, after the training phase, the reconstructed signal (from the pre-trained compressor) allows executing additional downstream tasks not necessarily differentiable. We found a low correlation between reconstruction error and downstream tasks’ error (see Section V-D). Thus, Dynamic-Deep is designed to predict the downstream tasks’ error rather than the reconstruction error as a proxy for the error feedback.
V Dynamic-Deep Implementation Details
V-a Multiple Compression Levels
The following architectural changes were made to extend the CAE32 to support a CG of 64 and reach 3-compression levels (64,32 or no compression). We chose 3-compression levels based on experiments comparing the performance (and footprint) of different numbers of compression levels, including CAE16. We obtained highest impact with 3-compression levels [dynamic-deeptechreport] (see Section LABEL:sec:cl_cg_tune) . Denote conv(x,y) as a convolutional layer with x number of filters, kernel size of y, and Upsample(z) as an upsample layer with size z. To support CG of 64, convolutional layers conv(64,7) and conv(1,3) were added after CAE32 encoder’s layer number 12, yielding an output shape of (31,1).
A decoder adapter transforms the compressed representation of 64 to input the CAE32’s decoder. The adapter has four layers: conv(16,7), conv(32,3), Upsample(2), conv(32,3), Upsample(2) with an output size of (124,32). The output of the adapter is the input of layer 18 in CAE32. Each compressed representation has its dense layer
to support the downstream tasks’ prediction. The dense layer’s output shape is (1,1), and it applies a ReLU activation[agarap2018deep].
V-B Minimizing Memory Footprint For IoT Support
Deploying compressing modules on lightweight IoT devices requires a resource-constrained implementation. We use two encoders to support 3-compression levels, which results in a 1.06MB memory footprint for the extended SOTA CAE. We use several techniques to reduce the model’s memory size:
Deep learning compression techniques: The number of learnable parameters of the CAE encoder’s 10th layer is reduced by decreasing the number of filters and kernel size. We compensate and preserve compression performance by preserving the receptive field using convolutional layers with stride=2 instead of pooling layers[cheng2017survey]. Note that the encoder’s 10th layer is used in all compression levels.
Sharing layers: first layers of each encoder (CAE32 and CAE64) have the same learnable parameters, therefore memory and computation are reused and avoid linear memory increase for each compression level. Only the last encoder’s layers of each compression level are computed uniquely. Fig. 4 illustrates the final design.
Table II summarizes the resulting memory size when Applying the above techniques. It decreases the number of parameters to only 84K parameters which are 67% fewer parameters to the straightforward 3-compression level CAE’s encoder.
|Original 2-compression level (CAE based)||133K||532|
|Original 3-compression level (CAE based)||266K||1064|
|Deep learning compression techniques||168K||678|
|Sharing layers (Dynamic-Deep 3-compression level)||83K||332|
V-C Combined Loss Function
Dynamic-Deep has three loss functions accumulated to a combined loss function. The reconstruction loss , calculates the distance between each sample in original data segment and the reconstructed data segment over samples.
The downstream task weighted error , accumulates the downstream tasks’ loss functions. Let denote task , and and denote its loss function and the weighted (scaling) factor of the loss function, respectively.
The combined loss function , combines the reconstruction loss with the downstream tasks’ weighted error . scales the to balance between reconstruction performance and downstream tasks performances.
Finally, Dynamic-Deep learns to predict using the mean squared error (MSE).
We observe a low correlation between the tested downstream tasks and the reconstruction errors across a wide range of the weighting factors (see tech report [dynamic-deeptechreport]) (see Section LABEL:sec:low_correlation) . Hence, training a downstream task in isolation on the reconstructed signal may result in limited performance.
We thus train Dynamic-Deep in three phases, where the second phase is repeated for each additional downstream task. First, we train the compressor to optimize the reconstruction task (see eq. (1)). Here, we use the MIT-BIH dataset by applying the preprocessing described in [yildirim2018efficient]. Second, we fine-tune with cascaded downstream tasks loss, including reconstruction loss (see eq. (3
)). The downstream tasks’ models’ weights are frozen and trained for additional 20 epochs with the CinC dataset555For the essence of evaluation, each training phase uses the same dataset the downstream task baseline uses. A benchmark for RPNet was created on CinC dataset to allow multiple downstream tasks training., by applying the preprocessing described in [rajpurkar2017cardiologist]. Finally, we train the dense layer to predict the downstream tasks’ weighted error for additional 10 epochs. As before, the compressor is frozen to not penalize previous training phases.
Vi Experimental Results
Vi-a Datasets and Downstream Tasks
Datasets: used for training and evaluation:
MIT-BIH [mark1988bih]: used to evaluate ECG compression as it includes different types of noise patterns and various shapes of arrhythmic QRS complexes [hong2020opportunities]. The benchmark contains 48 half-hour ambulatory ECG recordings with 11-bit resolution and sample rate of 360Hz.
CinC [clifford2015physionet]: Captured from the AliveCor ECG monitor and contains about 7000 records. These records have annotations for Atrial Fibrillation(AF), noise, other rhythms or normal. The data was stored at a sample of 300 Hz. Each record contains 8960 samples with 35 annotations.
Downstream tasks: two kind of tasks (NN based) architectures were chosen:
R-R peak extraction(RPnet): implemented using NN [vijayarangan2020rpnet] as a regression task
HR arrhythmias classification: implemented using convolutional NN (CNN) [rajpurkar2017cardiologist] as a classification task.
Both are commonly used tasks in real-world ECG applications.
Vi-B Task Awareness Evaluation
We focus our evaluation on the Dynamic-Deep downstream tasks’ predictions performed by the dense layer in the compression module. We compare our predictions to two theoretical models, in which the downstream task feedback is available for the compressor:
2-level feedback-aware: the method has 2-compression levels of 64 or no compression.
3-level feedback-aware: the method has 3-compression levels of 64, 32 or no compression.
Each model executes the downstream tasks for every data segment and measures the error at each compression level. Then, they choose the highest compression that meets the configured upper bound error. If none meets the upper bound, the no compression level is chosen. Note that such a method is not applicable in typical IoT settings since the feedback is not readily available at the edges. We evaluated Dynamic-Deep vs. the theoretical models above considering the following setups:
Single downstream task awareness: of HR arrhythmia classification or R-R peak extraction. Fig. 5 shows that Dynamic-Deep follows the trends of the theoretical method successfully. Increasing the upper bound error increases the CG and the effective task loss and vice versa. For every configured upper bound Dynamic-Deep results with a lower effective task loss than the configured upper bound. There is an improvement in CG for both tasks when increasing the number of compression levels from 2 to 3. See tech report for analysis of a higher number of compression levels and results on additional downstream task, R-R peak extraction [dynamic-deeptechreport].
For the RPNet task, moving from 2 to 3 compression levels yields a diminishing return. The diminishing return is a result of low difference in loss quartiles between compression levels as explained in section III. Nonetheless, Dynamic-Deep successfully follows the trends and chooses the optimal decision (see fig. LABEL:fig:RPNet_dynamic_deep).
Multiple downstream tasks’ awareness: of both R-R peak extraction and HR arrhythmia classification. Supporting multiple downstream tasks introduces a tradeoff on which downstream task to optimize. Dynamic-Deep uses a downstream task weighted error (see eq. 2), which can be viewed as a single task awareness. Fig. 6 demonstrates that multi-task awareness has similar results to a single task awareness.
When designing Dynamic-Deep, two additional parameters are taken into account: the number of compression levels and the CG of each compression level. The following section presents different experiments that helped design the final architecture of Dynamic-Deep. Higher CG than 64 improved the theoretical feedback-aware up to an average CG of 100, however, Dynamic-Deep had low performance utilization of those levels (see in [dynamic-deeptechreport]).
Vi-C Cloud Cost Reduction Analysis Using Dynamic-Deep
Compression methods are used in IoT settings to reduce storage costs and networking bandwidth. For Cloud costs evaluation, we consider storage and computation (data decompression for consumption by downstream tasks) costs since Cloud inbound traffic is usually free. We compare the following operational models:
Dynamic-Deep: IoT device sends a compressed representation, which is stored in the Cloud, and Cloud side decompression phase is applied before downstream tasks’ execution.
Dynamic-Deep with uncompressed: IoT device sends both compressed and uncompressed representations. The compressed representation is stored while downstream tasks operate on the uncompressed representation.
We assume that a domain expert reviews some portion of historical sensor data, and accounts for the corresponding overhead, the cost of fetching data from storage and decompressing it, in both models.
We ran these two models on Google Cloud Platform [google] using ECG data segment traffic equivalent to a small-mid hospital with 200 beds. We configured the upper bound error of HR arrhythmia classification to be 0.75 and received an average CG of 48.31. We measured the computation expenses of our setup on an N1-Custom instance with 1 CPU, 2GB RAM, Intel Xeon 2.2GHz. Fig. 7 presents the measured results. Lossless compression reduces expenses by 63% regardless of the specific architecture due to its low computation usage. Dynamic-Deep with uncompressed architecture saves up to 97% cost expenses compared to no compression solution and is more efficient than lossless even with 100% data fetching.
We presented a dynamic compression method that leverages downstream task feedback prediction to improve compression gain and allow the system admin to configure the desired performance level. We successfully showed CG improvements on two types of downstream tasks using two ECG signals datasets. Our future work will try to apply the approach on other domains and study the impact of tuning additional CG with different number of compression levels.
Acknowledgment - We thank Guy Vinograd from bio-T for his insightful comments, and are also deeply grateful to Elad Levy for his valuable feedback on model design and analysis.