With the recent technological advances in Internet of Things (IoT) applications, more than one billion connected objects are expected to be launched worldwide by 2025111https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide. Power consumption is among the biggest challenges targeting connected objects, particularly in the industrial domains, where several sensing systems are commonly launched in the field to run for days or even weeks without being recharged. Typically, such devices use sensors to capture properties such as temperature or motion, and stream them to a host system over a radio transmission protocol such as Bluetooth Low-Energy (BLE). System designers aim to reduce the rate of data transmission as much as possible, as radio transmission is a power-hungry operation.
Compression is a key technique to reduce the rate of radio transmission. While in several applications lossless compression methods are more desirable than lossy compression techniques, in the context of IoT and sensor data streams, the measured sensor data intrinsically involves noise and measurement errors, which can be treated as a configurable tolerance for a lossy compression algorithm.
The Lightweight Temporal Compression method (LTC [schoellhammer2004lightweight]) has been designed specifically for energy-constrained systems, initially sensor networks. It approximates data points by a piece-wise linear function that guarantees an upper bound on the reconstruction error, and a reduced memory footprint in . However, LTC has only been described for 1D streams, while streams acquired by connected objects, such as acceleration or gyroscopic data, are often multi-dimensional.
In this paper, we extend LTC to dimension . To do so, we propose an algebraic formulation of the algorithm that also yields a norm-independent expression of it. We implement our extension on Motsai’s Neblina module222https://motsai.com/products/neblina, and we test it on 3D acceleration streams acquired during human exercises, namely biceps curling, walking and running. Our implementation of LTC is available as free software.
We assume that the stream consists of a sequence of data points received at uneven intervals. The compression algorithm transmits fewer points than it receives. The transmitted points might be included in the stream, or computed from stream points. The compression ratio is the ratio between the number of received points and the number of transmitted points. An application reconstructs the stream from the transmitted points: the reconstruction error is the maximum absolute difference between a point of the reconstructed stream, and the corresponding point in the original stream.
Section II provides some background on the LTC algorithm, and formalizes the description initially proposed in [schoellhammer2004lightweight]. Section III presents our norm-independent extension to dimension , and Section IV describes our implementation. Section V reports on experiments to validate our implementation, and evaluates the impact of n-dimensional LTC on energy consumption of connected objects.
Ii Lightweight Temporal Compression
LTC approximates the data stream by a piece-wise linear function of time, with an error bounded by parameter .
The algorithm receives a stream of data points at times (), and it transmits a stream of data points at times (). To simplify the notations, we assume that:
That is, transmission times coincide with reception times. We define the shifted received points as follows:
where is such that and:
This definition is such that is the data point received after the transmission and is the corresponding timestamp. Figure 1 illustrates the notations and algorithm.
The LTC algorithm maintains two lines, the high line, and the low line defined by (1) the latest transmitted point and (2) the high point (high line) and the low point (low line). When a point (, ) is received, the high line is updated as follows: if is below the high line then the high line is updated to the line defined by the last transmitted point and (, ); otherwise, the high line is not updated. Likewise, the low line is updated from . Therefore, any line located between the high line and the low line approximates the data points received since the last transmitted point with an error bounded by .
Using these notations, the original LTC algorithm can be written as in Algorithm 1. For readability, we assume that access to data points is blocking, i.e., the program will wait until the points are available. We also assume that the content of variable tr is transmitted after each assignment of this variable. Function line, omitted for brevity, returns the ordinate at abscissa (1st argument) of the line defined by the points in its 2nd and 3rd arguments.
Iii Extension to dimension
In this section we provide a norm-independent formulation of LTC in dimension . By we refer to the dimension of the data points . To handle time, LTC actually operates in dimension .
Iii-a Preliminary comments
We note that the formulation of LTC in [schoellhammer2004lightweight] relies on the intersection of convex cones in dimension . For , it corresponds to the intersection of triangles, which can efficiently be computed by maintaining boundary lines, as detailed previously. In higher dimension, however, cone intersections are not so straightforward, due to the fact that the intersection between cones may not be a cone.
To address this issue, we formulate LTC as an intersection test between balls of dimension , that is, segments for , disks for , etc. Balls are defined from the norm
used in the vector space of data points. For, the choice of the norm does not really matter, as all p-norms and the infinity norm are identical. In dimension , however, norm selection will be critical.
Iii-B Algebraic formulation of LTC
Let be the latest transmitted point. For convenience, all the subsequent points will be expressed in the orthogonal space with origin . We denote by such points:
Let be the ball of of centre and radius :
Note that is defined as soon as one point is received after the last transmission.
Iii-B2 LTC property
We define the LTC property as follows:
The original LTC algorithm ensures that the LTC property is verified between each transmission. Indeed, all the data points such that is between the high line and the low line verify the property. Line 13 in Algorithm 1 guarantees that such a point exists.
The LTC property can be re-written as follows:
Note that is a sequence of balls of strictly decreasing radius, since .
Iii-D Ball intersections
Although Algorithm 2 looks simple, one should not overlook the fact that there is no good general algorithm to test whether a set of balls intersect. The best general algorithm we could find so far relies on Helly’s theorem which is formulated as follows [helly1923mengen]:
Let be a collection of convex subsets of . If the intersection of every subsets is non-empty, then the whole collection has an non-empty intersection.
This theorem leads to an algorithm of complexity which is not usable in resource-constrained environments.
The only feasible algorithm that we found is norm-specific. It maintains a representation of the intersection which is updated at every iteration. The intersection tests can then be done in constant time. However, updating the representation of the intersection may be costly depending on the norm used. For the infinity norm, the representation is a rectangular cuboid which is straightforward to update by intersection with an n-ball. For the Euclidean norm, the representation is a volume with no particular property, which is more costly to maintain.
Iii-E Effect of the norm
As mentioned before, norm selection in has a critical impact on the compression error and ratio. To appreciate this effect, let us compare the infinity norm and the Euclidean norm in dimension 2. By comparing the unit disk to a square of side 2, we obtain that the compression ratio of a random stream would be times larger with the infinity norm than with the Euclidean norm. In 3D, this ratio would be . Conversely, a compression error bounded by with the infinity norm corresponds to a compression error of with the Euclidean norm. Unsurprisingly, the infinity norm is more tolerant than the Euclidean norm.
It should also be noted that using the infinity norm in boils down to the use of the 1D LTC algorithm independently in each dimension, since a data point will be transmitted as soon as the linear approximation doesn’t hold in any of the dimensions. For the Euclidean norm, however, the multidimensional and multiple unidimensional versions are different: the multiple unidimensional version behaves as the infinity norm, but the multidimensional version is more stringent, leading to a reduced compression rate and error.
To choose between the multidimensional implementation and multiple unidimensional ones, we recommend to check whether the desired error bound is expressed independently for every sensor, or as an aggregate error between them. The multidimensional version is more appropriate for multidimensional sensors, for instance 3D accelerometers or 3D gyroscopes, and the multiple unidimensional version is more suitable for multiple independent sensors, for instance a temperature and a pressure sensor.
To implement LTC in nD with the infinity norm, we maintain a cuboid representation of across the iterations of the while loop in Algorithm 2. The implementation works with constant memory and requires limited CPU time.
With the Euclidean norm, the intersection test is more complex. We keep in memory a growing set of balls and the bounding box of their intersection. Then, when a new point arrives, we consider the associated ball and our intersection test works as in Algorithm 3. box is a function that returns the bounding box of an n-ball. find_bisection(S, B) searches for a point in all the elements in S, using plane sweep and bisection initialized by the bounds of B. Our code is available at https://github.com/big-data-lab-team/stream-summarization under MIT license.
V Experiments and Results
We conducted two experiments using Motsai’s Neblina module, a system with a Nordic Semiconductor nRF52832 micro-controller, 64 KB of RAM, and Bluetooth Low Energy connectivity. Neblina has a 3D accelerometer, a 3D gyroscope, a 3D magnetometer, and environmental sensors for humidity, temperature and pressure. The platform is equipped with sensor fusion algorithms for 3D orientation tracking and a machine learning engine for complex motion analysis and motion pattern recognition[sarbishei2016accuracy]. Neblina has a battery of 100mAh; at 200 Hz, its average consumption is 2.52 mA when using accelerometer and gyroscope sensors but without radio transmission, and 3.47 mA with radio transmission, leading to an autonomy of 39.7 h without transmission and 28.8 h with transmission.
V-a Experiment 1: validation
We validated the behaviour of our LTC extension on a PC using data acquired with Neblina. We collected two 3D accelerometer time-series, a short one and a longer one, acquired on two different subjects performing biceps curl, with a 50 Hz sampling rate (see Figure 3). In both cases, the subject was wearing Neblina on their wrist, as in Figure 2. It should be noted that the longest time-series also has a higher amplitude, perhaps due to differences between subjects.
We compressed the time-series with various values of , using our 2D (x and y) and 3D (x, y and z) implementations of LTC. On Neblina, the raw uncalibrated accelerometer data corresponds to errors around 20 mg (1 g is 9.8 m/s). We used a laptop computer with 16 GB of RAM, an Intel i5-3210M CPU @ 2.50GHz 4, and Linux Fedora 27. We measured memory consumption using Valgrind’s massif tool [nethercote2006building], and processing time using gettimeofday() from the GNU C Library.
Results are reported in Table I. As expected, the compression ratio increases with , and the maximum measured error remains lower than in all cases. The maximum is reached most of the time on these time-series.
Infinity vs Euclidean norms
The average ratio between the compression ratios obtained with the infinity and Euclidean norm is 1.03 for 2D data, and 1.06 for 3D data. These ratios are lower than the theoretical values of in 2D and in 3D, which are obtained for random-uniform signals. Unsurprisingly, the infinity norm surpasses the Euclidean norm in terms of resource consumption. Memory-wise, the infinity norm requires a constant amount of 80 B, used to store the intersection of n-balls. The Euclidean norm, however, uses up to 4.7 KB of memory for the Long time-series in 3D with =48.8 mg. More importantly, the amount of required memory increases for longer time-series, and it also increases with larger values of . Similar observations are made for the processing time, with values ranging from 0.4 ms for the simplest time-series and smallest , to 41.3 ms for the most complex time-series and largest .
2D vs 3D
For a given
, the compression ratios are always higher in 2D than in 3D. It makes sense since the probability for the signal to deviate from a straight line approximation is higher in 3D than it is in 2D. Besides, resource consumption is higher in 3D than in 2D: for the infinity norm, 3D consumes 1.4 times more memory than 2D (1.8 times on average for Euclidean norm), and processing time is 1.35 longer (1.34 on average for Euclidean norm).
V-B Experiment 2: impact on energy consumption
We acquired two 3D accelerometer time-series at 200 Hz for two activities: walking and running (see Figure 4). In both cases, the subject was wearing Neblina on their wrist as in Experiment 1. We collected 1,000 data points for each activity, corresponding to 5 seconds of activity.
We measured energy consumption associated with the transmission of these time-series by “replaying” the time-series after loading them as a byte array in Neblina. We measured the current every 500 ms. We also measured the max and average latency resulting from compression.
Results are reported in Table II. For a given and norm, the compression ratio is larger for walking than for running. The ratio of saved energy is relative to the reference current of 3.47 mA measured when Neblina transmits data without compression. In all cases, activating compression saves energy. The reduction in energy consumption behaves as the compression ratio: it increases with , it is higher for the infinity norm than for the Euclidean, and it is higher for the walking activity than for running. For a realistic error of =9.8 mg, the ratio of saved energy with the infinity norm is close to 20% for the walking activity, which is substantial. Latency is higher for walking than for running, and it is also higher for the Euclidean norm than for the infinity norm. In all cases, the latency remains lower than the 5-ms tolerable latency at 200 Hz, which demonstrates the feasibility of 3D LTC compression.
We presented an extension of the Lightweight Temporal Compression method to dimension that can be instantiated for any distance function. Our extension formulates LTC as an intersection detection problem between n-balls. We implemented our extension on Neblina for the infinity and Euclidean norms, and we measured the energy reduction induced by compression for acceleration streams acquired during human activities.
We conclude from our experiments that the proposed extension to LTC is well suited to reduce energy consumption in connected objects. The implementation behaves better with the infinity norm than with the Euclidean one, due to the time complexity of the current algorithm to detect the intersection between n-balls for the Euclidean norm.
Our future work focuses on this latter issue. We plan to start from Helly’s theorem, which only provides an algorithm of complexity to compress points in dimension . We note that Helly’s theorem holds for arbitrary convex subsets of , while we are considering a sequence of balls of decreasing radius. Based on this observation, a stronger result might exist that would lead to a more efficient implementation of LTC with the Euclidean norm. Our current idea is to search for a point expressed as a function of the ball centres that would necessarily belong to the ball intersection when it is not empty; such a point, if it exists, necessarily converges to the centre of the last ball in the sequence as increases, as the radius of the last ball decreases to zero. The resulting algorithm would then compute this point and check its inclusion in every ball, which is done in complexity.
The choice of an appropriate norm should not be underestimated. Some situations might be better described with the Euclidean norm than with the infinity norm, such as the ones involving position or movement measures. Using the infinity norm instead of the Euclidean would lead to important error differences, proportional to in dimension . Investigating other norms, in particular the 1-norm, would be relevant too.
This work was funded by the Natural Sciences and Engineering Research Council of Canada.