Remote Multilinear Compressive Learning with Adaptive Compression

09/02/2021
by   Dat Thanh Tran, et al.
Tampere Universities
4

Multilinear Compressive Learning (MCL) is an efficient signal acquisition and learning paradigm for multidimensional signals. The level of signal compression affects the detection or classification performance of a MCL model, with higher compression rates often associated with lower inference accuracy. However, higher compression rates are more amenable to a wider range of applications, especially those that require low operating bandwidth and minimal energy consumption such as Internet-of-Things (IoT) applications. Many communication protocols provide support for adaptive data transmission to maximize the throughput and minimize energy consumption. By developing compressive sensing and learning models that can operate with an adaptive compression rate, we can maximize the informational content throughput of the whole application. In this paper, we propose a novel optimization scheme that enables such a feature for MCL models. Our proposal enables practical implementation of adaptive compressive signal acquisition and inference systems. Experimental results demonstrated that the proposed approach can significantly reduce the amount of computations required during the training phase of remote learning systems but also improve the informational content throughput via adaptive-rate sensing.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/15/2019

Analysis of non-stationary multicomponent signals with a focus on the Compressive Sensing approach

The characterization of multicomponent signals with a particular emphasi...
05/24/2013

Compressive Sensing of Sparse Tensors

Compressive sensing (CS) has triggered enormous research activity since ...
08/06/2020

Nephalai: Towards LPWAN C-RAN with Physical Layer Compression

We propose Nephelai, a Compressive Sensing-based Cloud Radio Access Netw...
09/14/2018

Deep Compressive Autoencoder for Action Potential Compression in Large-Scale Neural Recording

Understanding the coordinated activity underlying brain computations req...
03/03/2019

Energy Efficiency Analysis of Collaborative Compressive Sensing Scheme in Cognitive Radio Networks

In this paper, we investigate the energy efficiency of conventional coll...
10/26/2021

C^2SP-Net: Joint Compression and Classification Network for Epilepsy Seizure Prediction

Recent development in brain-machine interface technology has made seizur...
10/25/2018

Waveform Signal Entropy and Compression Study of Whole-Building Energy Datasets

Electrical energy consumption has been an ongoing research area since th...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Internet-of-Things (IoT) technologies allow the deployment of smart sensors in every corner of the world, including the most remote areas. This is thanks to the development of low-cost electronics and embedded devices, as well as the advancements in wireless communication technologies such as those in Low Power Wide Area Networks (LPWAN). An IoT system is a cyber-physical system in which the physical (client) side often consists of a network of smart sensors and embedded devices deployed in various physical locations, with capability to acquire signals and perform light-weight computation before transmitting them to a network cloud. On the cyber (server) side, the collected data, which is aggregated and analyzed by the network server, can be used for automatic decision making and intelligent services [18, 17, 26]. Since the whole idea of IoT systems is built on information gathering, efficient signal acquisition and transmission play an important role in IoT applications. Among different factors, energy consumption and computational complexity are key factors that determine the efficiency of data collection in IoT services since deployed sensors are often small, low-power embedded devices having small computational capacity and limited energy source.

Under such system requirements, Compressive Sensing [5], which is an efficient signal acquisition technology, is a suitable solution for sensor data collection in an IoT stack. The efficiency in CS devices comes from the fact that signals are sampled, quantized and compressed at the same time, at the hardware level, i.e., on the sensors. This is different from traditional sensors from which we obtain a large amount of discrete samples or raw measurements from the signal, and the compression step is conducted after the acquisition step, at the software level. In CS, the signal is measured and compressed at the same time, before being registered in the memory during the sampling phase. Due of this, a CS sensor requires a much smaller memory size to store temporary data, and outputs a discrete signal with a significantly lower number of measurements for storage and transmission.

Given an input signal that has been sampled and quantized, a CS device, instead of outputing , it registers a compressed version of by linearly projecting onto a lower-dimensional space as follows:

(1)

where denotes the measurement of the input signal and denotes the projection matrix. The number of measurements obtained by a CS device is often significantly lower than the dimension of , i.e., , so is a fat matrix. is also referred to as the sensing operator or sensing matrix.

Here we should note that what we obtain from a traditional sensor is , which is often sampled at a sampling rate higher than the Nyquist rate to ensure perfect reconstruction of the input signal. Since the dimension of is lower than that of , the signal is undersampled in a CS device. Although Shanon Theorem on ideal sampling specifies that a signal must be sampled at a higher rate than the Nyquist rate to guarantee perfect reconstruction, the undersampled signal registered by a CS device can still be recovered near perfectly if the input signal can be expressed with a sparse representation in some domain and the sensing matrix possesses certain properties [4, 9]. These results are known as the CS theory, which is the foundation and motivation for the developments of compressive sensing and compressive learning methods.

Although signals can be acquired at a very low cost using the CS paradigm, reversing them to the original domains is a daunting task since it involves solving an optimization problem to determine the sparse representation and the corresponding bases of the signal. In some use cases, signal recovery is necessary and the quality of signal reconstruction plays an important role. For example in medical imaging for health diagnosis, the higher the resolution and fidelity of the reconstructed signal are, the better chance of distinguishing between health abnormality and noise generated from the sensing process. However, for many applications, the purpose of acquiring signals is to perform classification or regression of some kind, rather than reconstructing the original signals. For example, in forest fire detection and monitoring via unmanned aerial vehicles, the objective of forest imaging is to automatically detect and locate potential locations of fire. In some applications that involve humans such as those in smart buildings, the reconstruction of data should be avoided since this step can potentially disclose private information. In fact, the majority of IoT services and automation systems gather data mainly for intelligence and decision purposes, rather than high-fidelity reconstruction.

Due of the aforementioned reasons, researchers have proposed machine learning models that are tailored to work directly with the compressed measurements obtained from CS devices without going through a proxy signal recovery step. Methodologies developed under this objective form a research topic known as Compressive Learning (CL). The early works in CL took a similar approach to CS literature, relying on random valued sensing operators and focusing on theoretical guarantees for models operating on compressed measurements

[3, 7, 6, 19]

. The reliance on random sensing matrices not only exempts us from the problem of joint estimation of the parameters of the learning model and the sensing operator but also allows us to take advantage of existing theoretical results on random linear projections

[2]. Besides, theoretical results on perfect signal reconstruction in CS were derived from random sensing matrices. Since the ability for high-fidelity signal recovery also implies the preservation of signal content in the compressed measurement, there is an assurance that the machine learning model is estimated with the same degree of informational content in such cases.

With wide adoption of modern stochastic optimization methods during the past decade, more recent works in CL have switched from using random sensing matrices to optimized ones, which are jointly estimated with the model’s parameters. Although theoretical results for the generalization ability of compressive learning models following the end-to-end learning approach are yet to be derived, several works demonstrated its superior performance over prior setups that use random sensing operators [1, 15, 10, 27, 24, 22]. Among these end-to-end compressive learning models, Multilinear Compressive Learning (MCL) [24]

is the leading solution in both inference performance as well as computational efficiency for multidimensional signals. This stems from the fact that MCL employs sensing and feature extraction modules that are designed to operate on tensors using multilinear operations. Since a multidimensional input signal is linearly projected along each tensor mode in MCL, the number of computations used to compress a given signal is significantly lower than the projection in Eq. (

1) while the multidimensional structure of the input signal is still retained after the projection.

Regardless of the approach followed to project the input signals, existing CL formulations require training separate model configurations for different compression rates. This usually results in long experimentation processes in order to determine a suitable trade-off between the compression rate and the inference performance when deploying a CL system. For applications that employ remote compressive sensing, a fixed compression rate for signal acquisition is undesirable. This is because the majority of modern communication standards support adaptive transmission rates to maximize throughput and minimize energy consumption as the transmission environment changes. For example, the Adaptive Data Rate (ADR) scheme is a key feature of the Long-Range Wide Area Networks (LoRaWAN) in IoT technology. By improving CL solutions giving them the ability to acquire the signal at an adaptive compression rate on the sensor level, hence allowing for adaptive degree of signal fidelity, which can be adjusted according to the network status, we can further maximize the signal content throughput of a CL system.

In this paper, we propose a novel optimization scheme and deployment setup for MCL that enables training multiple model configurations with different compression rates in one shot. The resulting system can be used to evaluate and benchmark different compressed measurement shapes, a process that reduces significantly the experimentation efforts required to find the optimal trade-off between data compression rate and inference performance. In addition, using the proposed optimization scheme, we can obtain sensing operators that produce highly structured compressed measurements which allow us to implement a CS device that is capable of signal acquisition at an adaptive compression rate, thereby solving the aforementioned shortcoming of existing approaches.

The paper is organized as follows. Section II reviews the MCL model and related literature. Section III provides details of our proposed optimization scheme and deployment setup. In Section IV, we provide detailed information about our experimental protocol, and qualitative and quantitative analyses of the empirical results. Section V concludes our work with remarks on the implications of the proposed contributions.

Ii Related Work

In this Section, before the discussion of related works, we will describe the Multilinear Compressive Learning framework, which is the basis of the proposed method. A Multilinear Compressive Learning (MCL) model [24], illustrated in Figure 1

, comprises three elements: the Multilinear Compressive Sensing (MCS) module, the Feature Synthesis (FS) module and the task-specific neural network

.

Fig. 1: Illustration of the MCL model

Different from the sensing model described in Eq. (1

), which only considers input signals of vector form, the Multilinear Compressive Sensing (MCS) model employed in MCL performs linear sensing along each of the dimensions of a given multidimensional input signal. Thus, the sensing is executed via a set of sensing matrices, also known as separable sensing operators. More specifically, let us denote the discretized input signal and the compressed measurement obtained from the MCS module as

and , respectively. In this case, represents the signal resolution that the sensor initially samples and quantizes. The MCS compression model is described by the following equation:

(2)

where denotes the set of separable sensing operators and denotes the mode- product between a tensor and a matrix. Detailed description of the mode- product can be found in [12]

. Basically, this operation linearly transforms every

-th dimensional slice of the tensor using the corresponding matrix .

Here we should note that at deployment the compressive sensing step described in the above equation (as well as the one in Eq. (1)) is implemented at the hardware level, i.e. on the sensing device. What we obtain from this device is (or , respectively), i.e. the compressed measurement. Thus, the sensing step in Eq. (2) should not be viewed as a feature extraction step since they are inherent in the signal acquisition procedure of the CS device. For end-to-end CL approaches, during development and optimization, we often simulate this signal acquisition step at the software level, using the high resolution signal that was sampled above the Nyquist rate using a standard sensor in order to optimize the sensing operators.

Given the compressed measurement , MCL synthesizes a high-dimensional tensor feature that is relevant for the learning task by the FS module. In order to preserve the multidimensional structure of the compressed measurement, the FS module proposed in [24] also employs a multilinear transform. However, since the FS module is implemented at the software level, usually on the computing cloud for remote sensing applications, the FS module is not constrained to the use of multilinear operations as long as the tensor structure of the signal is preserved. For example, the authors in [22] extended the original design of the MCL model in [24] with a FS module that contains several convolution and up-sampling layers. The choice of the FS module mainly depends on the operating power of the computing server. In this work, we investigate our optimization scheme using the original design in [24] for the FS component, which can be described by the following equation:

(3)

where denotes the synthesized feature, denotes the parameters of the FS component, and denotes its functional form.

Finally, is introduced to a task-specific neural network to produce a prediction for the given compressed measurement. The architecture of

, which depends on the given problem at hand, is the same architecture that one would use to classify uncompressed signals, i.e., the high-resolution signal

. For this reason, the dimensions of in Eq. (3) are the same as those of , i.e. the high-resolution signal.

In MCL, all model parameters are jointly optimized in an end-to-end manner by a stochastic gradient descent optimizer. Different from the conventional approach where the parameters are initialized with random values, MCL determines the initial values of each component’s parameters by solving two optimization problems: one for the MCS and FS modules and the other for the task-specific neural network. As we mentioned previously, for methods that optimize the sensing operator we need a labeled set of high-resolution signals, which is obtained using a standard sensor sensing at higher-than-Nyquist rate, in order to simulate the compressive sensing step during optimization. Let us denote

as the number of samples in the training set. In addition, we also denote the -th high-resolution sample in the training set as , its compressed measurement as , the corresponding synthesized feature as and the label as

. The initial parameters’ values of the MCS and FS modules are obtained by solving Higher Order Singular Value Decomposition (HOSVD)

[8] using the set of high-resolution signals . More specifically, let denotes the concatenation of all samples along the -th dimension. In addition, let us denote the HOSVD of along the first dimensions as follows:

(4)

The sensing operators are initialized by setting , and the FS parameters are initialized by setting . Here

denotes the tensor core that contains the singular values and

() denotes the factor matrices of the decomposition. By following the above initialization scheme for the sensing and the feature extraction matrices, the energy of in is preserved since () are unitary matrices.

The parameters of the task-specific neural network are initialized by training on the high-resolution signals to minimize the classification loss:

(5)

where denotes the parameters of , and denotes its prediction, given the input

. The learning loss function is denoted by

.

Slightly related to our work is the work in [23], which studies a surrogate performance indicator that allows fast estimation of the ranking between different model configurations in MCL. In [23], the authors found out that the mean-squared error obtained during the initialization step in MCL can be used as a performance indicator since this quantity exhibits a high correlation with the final classification error. The work in [25] also bears some similarity to our work in the sense that the method is also capable of learning single neural network that can classify different measurement sizes. However, the method in [25]

is different from our work since it was proposed for vector-based CL utilizing a random sensing operator, and trains the classifier based on data augmentation. Since the proposed adaptive compression MCL model relies on compressed measurements with a predefined semantic structure, the works in learning disentanglement in variational autoencoders, see for example

[16], are slightly related to our method. The main idea in learning disentanglement in generative models is to enforce certain dimensions of the feature to encode specific visual features of generated images. Different than this, our method aims to enforce an ordering structure in the compressed measurements so that multiple compressed measurements can be extracted from a single measurement.

Iii Proposed Method

Fig. 2: Illustration of compressed measurements of high compression rates constructed from the original compressed measurement .

The main motivation of our work is to develop a remote Compressive Learning system that is capable of compressive signal acquisition and learning with an adaptive compression rate. The ability to adaptively adjust the compression rate, hence the degree of signal fidelity and the amount of data transmitted for each sample, can significantly enhance the information content throughput of the remote sensing and learning application. This is because in real-world scenarios, the conditions of data transmission medium, especially in wireless communication, can vary from time to time, thus, many communication protocols support adaptive transmission rates. The availability of such a feature from a communication protocol enables a network to maximize its throughput while minimizing the energy consumption. However, this feature alone cannot maximize the data content throughput, i.e., the amount of signal information transmitted in a period of time. For example, the streaming server in a video streaming service must be able to allow for an adjustable transmission rate and to send video frames with a resolution that is adaptive to the network strength in order to ensure a consistent number of frames per second, thus an uninterrupted viewing experience.

From the hardware point of view, it is infeasible to implement a MCS sensor with an adaptive compression rate through the use of multiple sets of sensing operators, each of which corresponds to a different compression rate. However, with a single set of sensing operators producing a compressed measurement , we can obtain a compressed measurement of a given size () that corresponds to a (higher) compression rate by forming from the elements in , i.e.:

(6)

where denotes the element of at position . To construct multiple instances of , each of which carries the amount of signal information approximately proportional to its size, we can optimize the set of sensing operators in such a way that the resulting possesses a predefined semantic structure. Specifically, we aim to learn a set of sensing operators that results in such that elements in carrying the most relevant signal information for the learning task concentrate around the zero-corner of , i.e., the corner at position . Furthermore, the elements are arranged according to their importance, with elements closer to position being more relevant.

With having the aforementioned structure, a compressed measurement corresponding to a higher compression rate is a sub-tensor of , that is:

(7)

The construction of is illustrated in Figure 2. One feature of the proposed semantic structure for is its computational efficiency. Since this structure allows the construction of from contiguous elements of , the indexing operation needed to create only requires accessing contiguous memory locations, a process that is more hardware-friendly compared to using a set of non-contiguous indices.

Fig. 3: Illustration of the proposed training method with stochastic binary mask .

Here we should note that is constructed on the client side, before being transmitted to the server. On the server side, where the FS module and task-specific neural network are implemented to make predictions with incoming compressed measurements of different sizes using a single instance of the FS module and , the FS module must be able to handle variable-size inputs. A simple solution to this requirement is to set the input size of the FS module to the maximum size of incoming compressed measurements, i.e., the size of

, and the incoming compressed measurements are appropriately zero padded to form tensors of a fixed size.

To this end, we define the following criteria for optimizing a remote MCL system: (i) the sensing step produces that has the proposed semantic structure, and (ii) the server side utilizes a single model instance to make predictions with variable-size compressed measurements. Here we should note that the initialization step (using HOSVD) in MCL, by itself, cannot induce the proposed structures in the compressed measurements after the whole model has been optimized. This is because during stochastic optimization, all parameters are jointly updated to optimize the learning objective and there is no explicit constraint to induce such a feature. This is evidenced in our experiments in Section IV. In order to satisfy both criteria using end-to-end training with stochastic gradient descent, we propose to randomly simulate the effect of variable compression rates via the means of stochastic structural dropout. Dropout [21], which randomly zeroes out intermediate representations in a neural network during optimization, is a regularization technique. This technique can be considered as training a virtual ensemble of sub-networks within the main model, thus, it is effective at reducing overfitting. In our case, we use dropout masks having predefined structures not only to train sub-networks that correspond to different compression rates but also to enforce a semantic structure in . More specifically, after performing the initialization steps of MCL described in Section II, a stochastic gradient descent-based optimizer is used to train all components in MCL with the following objective:

(8)

where denotes the element-wise multiplication operator. is a random binary matrix having the following structure:

(9)

where denotes an integer value randomly drawn from the set for all , with and denoting predefined minimum and maximum values for a given dimension of the compressed measurement. In practice, based on the specifications of the transmission network, we can always estimate suitable values for and for all .

The proposed training process with stochastic binary mask is illustrated in Figure 3. The proposed dropout strategy is a simple, yet efficient way to express our goal of inducing the aforementioned semantic structure in the optimization process: with the stochastic dropout mask, we simply instruct the stochastic optimizer to optimize all parameters of the model so that classification error is minimized for any compressed measurement size that lies within the minimum and maximum sizes. By applying the binary mask to , we implicitly train the MCL model to perform sensing and learning with a compressed measurement of size , which is randomly defined during stochastic optimization. To do so, the value of is changed in every forward pass for every training sample during the optimization.

Finally, we should note that in order to adopt the proposed MCL with adaptive compression rate, the server side is not constrained to use a single instance of the FS module and task-specific neural network. While the client side, i.e., the sensing device, has critical limitation in terms of computational power and energy, there is no inherent limitation for the server. In case the server has enough computational power, we can increase the performance of the entire system by running multiple FS modules and task-specific neural networks to make predictions for different compressed measurement sizes. That is, after optimizing the parameters of the entire MCL system using the loss function in Eq. (8), we can fix the sensing operators and finetune the parameters of the components at the server side (FS and ) for each compression rate. During the deployment, on the sensing device, we still implement a single set of sensing operators and compressed measurements of different sizes are constructed adaptively with sub-tensors of the output of the sensing device as described previously. On the server side, based on the size of the received compressed measurement, prediction is generated with the corresponding FS and modules. In Section IV, we show that this approach can significantly enhance the overall performance of the system.

Iv Experiments

single
-rate
one-shot
baseline
adaptive
-rate

average
epochs
TABLE I: Test Accuracy (%) on CIFAR10

This section provides the empirical analysis conducted to benchmark our training scheme for MCL models. Information about the datasets and experimental protocol are presented first, followed by the experimental results and discussion.

Iv-a Datasets and Experiment Protocol

Our experiments were conducted using publicly available image datasets describing object classification and face recognition tasks. Object and face recognition are necessary features in smart buildings and surveillance systems. As we mentioned before, in order to train end-to-end compressive learning models, we need labeled data that has been collected by standard sensors. For this reason, we used CIFAR

[13] and CelebA datasets [14] in our experiments, both of which are publicly available. A brief description of the datasets and our data split are provided below:

  • CIFAR dataset [13] is an RGB image dataset, which contains thumbnail-size images of resolution pixels. The dataset is formed by K images that are divided into the training set of K images and the test set of K. The dataset contains two label sets, each of which has and classes, respectively. We refer to the two versions as CIFAR10 and CIFAR100. In order to perform proper validation, we randomly selected

    K images from the training set to form the validation set. Images in different classes are uniformly distributed in both CIFAR10 and CIFAR100.

  • CelebA [14] is a large-scale face attributes dataset with more than images of about different people at varying resolutions. For this dataset, we followed the same experimental protocol as in [24] and used a subset of people to train and evaluate the performance of the methods. The training, validation and test sets contain , and images, respectively. All images were resized to fixed resolution images of pixels.

single
-rate
one-shot
baseline
adaptive
-rate

average
epochs
TABLE II: Test Accuracy (%) on CIFAR100

In our experiments, we adopted the same task-specific neural network architecture that was used in the experiments of [24], namely the AllCNN architecture proposed by [20]

. AllCNN is a feed-forward architecture that contains only convolution layers, without any residual connection. For the details of AllCNN network, the reader is referred to

[24].

For stochastic gradient descent optimization, we used ADAM optimizer [11] with and . All models were trained for epochs with an initial learning rate of , which is reduced by a factor of at epoch and , respectively. Weight decay of magnitude was used for regularization. The input pixel values were scaled into the range , and we performed a simple data augmentation technique during training by random horizontal flipping and random shifting by

pixels in both horizontal and vertical axes. We tested many configurations corresponding to a variety of compression rates. The experiments were repeated five times and the means and standard deviations of the accuracy measured on the test set are reported.

Iv-B Results

Since CIFAR10, CIFAR100 and CelebA datasets were set to the same resolution, the size of is in all experiments. In the following, we will denote the results produced by the original training method proposed in [24] as single-rate, and our one-shot training method as adaptive-rate.

For the single-rate training we trained multiple models, each for one of the following compressed measurement sizes: , , , ; , , , ; , , , , ; , .

single
-rate
one-shot
baseline
adaptive
-rate

average
epochs
TABLE III: Test Accuracy (%) on CelebA

For our adaptive-rate training method, we trained only one model with the maximum and minimum dimensions of the compressed measurements set to and . That is, the size of in Eq. (8) is , and and when sampling in Eq. (9). After training, we simply evaluated this model with different compressed measurement sizes that were used to train the single-rate models.

In addition, to demonstrate the effectiveness of the stochastic mask , we also trained a MCL model with the compressed measurement of size , using the original training method in [24]. This model, denoted as baseline, is then used to evaluate the target set of compressed measurements mentioned above, using the same evaluation procedure as in adaptive-rate models. Thus, baseline model and our (adaptive-rate) model have the same setup that represents an one-shot training setting in which one model is trained and used for multiple compressed measurement sizes. The results for CIFAR10, CIFAR100 and CelebA datasets are shown in Tables I, II and III, respectively. The results are grouped according to the compression rate. The average accuracy of each method and the total number of epochs used to train the model(s) used for each dataset are also provided in these tables.

single
-rate
adaptive
-rate
adaptive
-rate*

average
epochs
TABLE IV: Finetuning Performance of adaptive-rate* on CIFAR10

As can be seen in Tables I, II and III, the proposed method (adaptive-rate) clearly outperforms the baseline method. In fact, we can see that without any modification to the original baseline training algorithm, the model trained with a large compressed measurement size ( in our case) cannot be used for other configurations with higher compression rates (i.e. smaller measurement sizes). The large variations between different runs of the baseline indicate that there is no semantic structure existing in the sub-tensors of the default compressed measurement, i.e., the compressed measurement of the maximum size. On the other hand, the results from adaptive-rate are more consistent between different experiment runs. This means that from the default compressed measurement of size , we can directly generate compressed measurements of smaller sizes which lead to consistent performance, indicating the existence of a semantic structure between elements in the compressed measurement trained by the proposed method.

Regarding the comparison with the conventional approach of single-rate using multiple models, each specializing on a compression rate, we can see that one model trained using the proposed adaptive-rate approach achieved very competitive performance. Specifically, the performance of the model trained using the adaptive-rate approach achieved performance close to that of the models trained using the single-rate approach on CIFAR100 dataset, while its performance is slightly lower on CIFAR10 and CelebA datasets, with an average of and performance degradation, respectively. However, the slight performance loss in adaptive-rate is compensated with significant computational and memory gains. For the single-rate approach, different compressed measurements need to be deployed, which are trained for a total of epochs of gradient updates. On the other hand, with adaptive-rate training, we needed to train only one model, which can be used for all different configurations corresponding to multiple compression rates.

Comparing the inference complexity between the adaptive-rate and a single-rate model, the adaptive-rate model requires slightly more floating-point operations. However, the difference resulted from the difference in the MCS and FS components, is very minor because the majority of computation happens in the task-specific neural network. For example, the AllCNN task-specific neural network requires about millions floating-point operations while the MCS and FS components require about floating-point operations, which account for about of the amount of computation required by the task-specific neural network in our experiments. In fact, the theoretical difference is so minor that we observe no actual differences in the inference run-time. For the same reason, the computational complexity required to train an adaptive-rate model for one epoch is very similar to that of a single-rate model. Thus, we use the number of training epochs to reflect the computational complexity induced by each model. Even though the total training time can better reflect the training complexity, this measure is not a reliable estimate of the training complexity in our case since different experiments were conducted on different workstations with different GPU models.

single
-rate
adaptive
-rate
adaptive
-rate*

average
epochs
TABLE V: Finetuning Performance of adaptive-rate* on CIFAR100

Another benefit of using the proposed training process is that we can also use adaptive-rate training technique to quickly identify the best configurations for a given compression rate. For example, in CelebA dataset, even though and correspond to the same compression rate with and when each configuration is optimized separately, the first two configurations lead to a much better accuracy. We can also observe this phenomenon when using adaptive-rate training, while requiring only of the computations compared to the former case. On CIFAR100, when using the same compression rate, the ranking is reversed; i.e., using and yields a much better accuracy compared to and , which can also be seen from the results of adaptive-rate.

Up until now, we have shown that with less than of accuracy drop, we can efficiently train and deploy a single model that can be used for inference with an adaptive compression rate. However, as we described at the end of Section III, the server side is not constrained to run a single instance of FS module and task-specific neural network. That is, after optimizing a single model instance using adaptive-rate method, we can fix the sensing operators and finetune the FS module and task-specific neural network for each compressed measurement size. To demonstrate this, we performed finetuning for epochs for each compressed measurement size. The results, denoted as adaptive-rate*, are shown in Tables IV, V, and VI. It can be clearly seen that by allowing separate model instances on the server side with adaptive-rate*, we obtain noticeable improvements in the overall performance, which are on-par with the performance obtained when using the single-rate training approach on CIFAR10 and CelebA datasets, and clearly better on CIFAR100 dataset. Although the training complexity (total number of epochs) using adaptive-rate* is higher than using a single model instance, it is still far below the one of single-rate.

single
-rate
adaptive
-rate
adaptive
-rate*

average
epochs
TABLE VI: Finetuning Performance of adaptive-rate* on CelebA

V Conclusions

In this paper, we proposed a novel training method and practical implementation for Multilinear Compressive Learning models that are capable of compressive signal acquisition and prediction with an adaptive compression rate. By enabling such a functionality in a remote compressive learning system, i.e. an adjustable degree of signal fidelity, the amount of data transmitted for each sample can be adjusted according to the conditions in the network. This can result in major improvements in the information content throughput of a remote sensing and learning application. Empirical evaluation of the proposed training approach showed that one can save a significant amount of computations during the training phase. Furthermore, when the server can handle multiple model instances, one can achieve performances comparable to the standard setup while still having the adaptive compression rate feature on the sensing devices and lowering the number of training computations significantly.

Vi Acknowledgement

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871449 (OpenDR). This publication reflects the authors’ views only. The European Commission is not responsible for any use that may be made of the information it contains.

The authors wish to acknowledge CSC – IT Center for Science, Finland, for computational resources.

References

  • [1] A. Adler, M. Elad, and M. Zibulevsky (2016) Compressed learning: a deep neural network approach. arXiv preprint arXiv:1610.09615. Cited by: §I.
  • [2] R. G. Baraniuk and M. B. Wakin (2009) Random projections of smooth manifolds. Foundations of computational mathematics 9 (1), pp. 51–77. Cited by: §I.
  • [3] R. Calderbank and S. Jafarpour (2012) Finding needles in compressed haystacks. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3441–3444. Cited by: §I.
  • [4] E. J. Candes, J. K. Romberg, and T. Tao (2006) Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 59 (8), pp. 1207–1223. Cited by: §I.
  • [5] E. J. Candès and M. B. Wakin (2008) An introduction to compressive sampling [a sensing/sampling paradigm that goes against the common knowledge in data acquisition]. IEEE signal processing magazine 25 (2), pp. 21–30. Cited by: §I.
  • [6] M. A. Davenport, P. Boufounos, M. B. Wakin, R. G. Baraniuk, et al. (2010) Signal processing with compressive measurements.. J. Sel. Topics Signal Processing 4 (2), pp. 445–460. Cited by: §I.
  • [7] M. A. Davenport, M. F. Duarte, M. B. Wakin, J. N. Laska, D. Takhar, K. F. Kelly, and R. G. Baraniuk (2007) The smashed filter for compressive classification and target recognition. In Computational Imaging V, Vol. 6498, pp. 64980H. Cited by: §I.
  • [8] L. De Lathauwer, B. De Moor, and J. Vandewalle (2000) A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications 21 (4), pp. 1253–1278. Cited by: §II.
  • [9] D. L. Donoho (2006) Compressed sensing. IEEE Transactions on information theory 52 (4), pp. 1289–1306. Cited by: §I.
  • [10] B. Hollis, S. Patterson, and J. Trinkle (2018) Compressed learning for tactile object recognition. IEEE Robotics and Automation Letters 3 (3), pp. 1616–1623. Cited by: §I.
  • [11] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §IV-A.
  • [12] T. G. Kolda and B. W. Bader (2009) Tensor decompositions and applications. SIAM review 51 (3), pp. 455–500. Cited by: §II.
  • [13] A. Krizhevsky and G. Hinton (2009) Learning multiple layers of features from tiny images. Technical report Citeseer. Cited by: 1st item, §IV-A.
  • [14] Z. Liu, P. Luo, X. Wang, and X. Tang (2015-12) Deep learning face attributes in the wild. In

    Proceedings of International Conference on Computer Vision (ICCV)

    ,
    Cited by: 2nd item, §IV-A.
  • [15] S. Lohit, K. Kulkarni, and P. Turaga (2016)

    Direct inference on compressive measurements using convolutional neural networks

    .
    In 2016 IEEE International Conference on Image Processing (ICIP), pp. 1913–1917. Cited by: §I.
  • [16] E. Mathieu, T. Rainforth, N. Siddharth, and Y. W. Teh (2019) Disentangling disentanglement in variational autoencoders. In International Conference on Machine Learning, pp. 4402–4412. Cited by: §II.
  • [17] M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani (2018) Deep learning for iot big data and streaming analytics: a survey. IEEE Communications Surveys & Tutorials 20 (4), pp. 2923–2960. Cited by: §I.
  • [18] B. Qi, M. Wu, and L. Zhang (2017) A dnn-based object detection system on mobile cloud computing. In 2017 17th International Symposium on Communications and Information Technologies (ISCIT), pp. 1–6. Cited by: §I.
  • [19] H. Reboredo, F. Renna, R. Calderbank, and M. R. Rodrigues (2013) Compressive classification. In 2013 IEEE International Symposium on Information Theory, pp. 674–678. Cited by: §I.
  • [20] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller (2014) Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806. Cited by: §IV-A.
  • [21] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15 (1), pp. 1929–1958. Cited by: §III.
  • [22] D. T. Tran, M. Gabbouj, and A. Iosifidis (2020) Multilinear compressive learning with prior knowledge. arXiv preprint arXiv:2002.07203. Cited by: §I, §II.
  • [23] D. T. Tran, M. Gabbouj, and A. Iosifidis (2020) Performance indicator in multilinear compressive learning. In 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1822–1828. Cited by: §II.
  • [24] D. T. Tran, M. Yamac, A. Degerli, M. Gabbouj, and A. Iosifidis (2020) Multilinear compressive learning. IEEE Transactions on Neural Networks and Learning Systems (2020) in press. Cited by: §I, §II, §II, 2nd item, §IV-A, §IV-B, §IV-B.
  • [25] Y. Xu and K. F. Kelly (2019) Compressed domain image classification using a multi-rate neural network. arXiv preprint arXiv:1901.09983. Cited by: §II.
  • [26] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang (2019)

    Edge intelligence: paving the last mile of artificial intelligence with edge computing

    .
    Proceedings of the IEEE 107 (8), pp. 1738–1762. Cited by: §I.
  • [27] E. Zisselman, A. Adler, and M. Elad (2018) Compressed learning for image classification: a deep neural network approach. Processing, Analyzing and Learning of Images, Shapes, and Forms 19, pp. 1. Cited by: §I.