1 Introduction
Federated learning (FL) has received significant interest for its advantages compared to traditional (i.e., centralized) machine learning. In addition to mitigating computational load on the central server, FL allows training a model on largescale datasets while protecting the users’ privacy. FL is a machine learning technique where multiple clients (e.g., devices or organizations) collaboratively train a model under the supervision of a central server. The clients train the learning model on their local datasets and send the updated gradient to the central server. The server calculates the mean of the received gradients and sends the new value of the global gradient to clients for the next training epoch. This process is repeated until getting the trained model.
Although FL ensures a certain level of privacy by not explicitly sharing the data with the server, an attacker (or the server) could retrieve a client’s training dataset using only the shared gradient Zhao et al. (2019). The problem of privacy protection has been solved using differential privacy (DP) Dwork et al. (2006); Dwork (2008); Ouadrhiri and Abdelhadi (2022)
. Before sending the gradients to the server, clients protect their gradients by adding noise drawn from a probability distribution
Gong et al. (2020); Yin et al. (2021); Wu et al. (2020). However, applying DP at each synchronization epoch degrades the privacy protection due to the composition theorem Dwork and Roth (2014). For example, if a client applies DP, at each synchronization round, with a privacy leakage , then after epochs, the privacy leakage becomes. Thus, a malicious server or an attacker could learn a tighter estimate of the clients’ gradients.
To control the privacy leakage, authors in Wei et al. (2021); Kim et al. (2021); Asoodeh et al. (2021)
propose an approach for determining the standard deviation of the Gaussian distribution to do not exceed a predefined privacy leakage
after synchronization epochs. Nevertheless, these approaches do not enhance privacy protection, because the determined standard deviation depends on the number of synchronization epochs , and eventually the privacy leakage increases by increasing . There is another category of works proposing to handle the problem of privacy leakage by training the FL model via peertopeer communications Cyffers and Bellet (2021); Tran et al. (2021); Li et al. (2021). In these works, the server sends the initialized gradient to a client chosen randomly from all clients. Then, this client updates the received global gradient from the server and sends it to another client, and so on, until the last client sends the updated global gradient to the server. These works ensures strong protection of users’ privacy; however, they are vulnerable to labelflipping and data poisoning attacks Fung et al. (2020); Fang et al. (2020).Moreover, recently Ren et al. Ren et al. (2021) succeeded to recover the training dataset even when the gradient was protected using DP. The authors generate a fake image input with its corresponding label using a generative regression neural network model (GRNN) and then feed this image to the training model at the server to calculate the fake gradient . Retrieving the original training images is done by training the GRNN model by minimizing the distance between the fake gradient and the true gradient . The authors are based on two main components to complete the training: the resolution of the target image, and
the length of the true gradient vector
.To overcome the aforementioned challenges, we propose a novel privacypreserving approach that guarantees strong protection of users’ privacy in FL. Specifically, this method includes two layers for privacy protection:

The first layer reduces the dimension of the client’ training dataset, using Hensel’s compression. We are the first to use Hensel’s Lemma (McDonald (1974), p.340) in dimensionality reduction.

The second layer implements DP by adding noise to the compressed dataset generated by the first layer. These two layers generate a privacypreserving dataset used by the client in the local training.
Therefore, the proposed approach hides the two principle components (i.e, the resolution of the target image and the length of the gradient vector) on which attackers based to recover the training dataset. Attackers or the malicious server will not have any visibility on the original private dataset of clients as the training is performed on the compressed noisy dataset. Furthermore, the proposed approach prevents the privacy leakage even though the synchronization training epochs increase. This is because DP is implemented once on the original dataset before starting the training. Thus, this approach solves the problem of privacy leakage due to composition. In summary, the main contributions of this paper are as follows.

We propose an imagebased data protection approach for protecting the privacy of users in FL. The proposed approach overwhelms the shortcomings of the existing DPbased approaches.

We develop a new dimensionality reduction method based on Hensel’s Lemma. Unlike the stateofart methods, we efficiently reduce the dimension of a dataset without losing information. On the other hand, the newly proposed dimensionality reduction method reduces the computational time and the communication overhead by reducing the size of the training dataset.

Experimental results demonstrate that our approach guarantees strong protection of users’ privacy while achieving good accuracy.
2 Proposed method
Figure 1 illustrates the main steps for training an FL model using the proposed approach. Before starting the training, the server sends the learning model architecture as well as the initial global gradient and dimension of the dataset elements to all clients, as shown in ’pretraining step ’. In ’pretraining step ’, each client reduces the dimension of the local dataset elements (i.e., layer ) and implements DP (i.e., layer ) on the compressed dataset generated by the first layer. The pretraining steps and are done once before starting the training to generate the privacypreserving dataset, which is used later in the training. After the pretraining steps, each client starts the local training and sends the local gradient to the server as illustrated in ’training step ’ and ’training step ’, respectively. In ’training step ’, the server aggregates the clients’ local gradients to update the global gradient. In ’training step ’, the server sends the updated global gradient to clients. The training steps , , , and are repeated until getting the trained learning model.
2.1 First layer: Dimensionality reduction using Hensel’s compression
The first layer reduces the dimension of the original dataset using Hensel’s compression. Unlike the dimensionality reduction methods proposed in the literature Marill and Green (1963); Whitney (1971); Narendra and Fukunaga (1977); Somol et al. (2004); Chen (2003); Almuallim and Dietterich (1991); Kira and Rendell (1992); Liu and Setiono (1996); Kachouri et al. (2010); Peng et al. (2005), the proposed method allows reducing the dimension of a dataset without losing information. The novelty of this paper is based on the following Hensel’s Lemma (McDonald (1974), p.340).
Lemma 1
Let , there is a unique sequence , , such that the series tends toward . This series is called the Hensel’s decomposition of .
In this approach, we call it Hensel’s compression as we are going in the opposite direction; that is to say, instead of decomposing a number we are combining several numbers into one number. In what follows, we explain our innovation with a use case example.
Given a dataset of images, and each element (i.e., image) of the dataset is a matrix . The approach consists of reducing the dimension of by dividing it into subblocks of dimension , such that and where . Thus, we get a new matrix of dimension . Figure 2 illustrates an example of reducing a matrix of dimension to another matrix of dimension : The first subfigure 2a presents the original matrix. In the second subfigure 2b, we divide the matrix into subblocks of dimension (i.e., and ). The last subfigure 2c shows the new generated matrix after applying Hensel’s compression. In this example, we have . Applying Hensel’s compression by taking and leads to get a new matrix calculated as follows:
(1) 
where represents the element at the first row and first column of the matrix . is calculated based on the subblock located at the first row and the first column in the subfigure 2b. In the same way, we calculate the other elements of the matrix based on the subblocks of matrix .
2.2 Second layer: Privacypreserving dataset
The second layer applies DP to the compressed dataset produced by the first layer to generate a privacypreserving dataset. To be specific, we add noise drawn from the Gaussian distribution that has been proved to satisfy the DP definition Dong et al. (2019), where is the privacy leakage, also known as the privacy budget, and is the sensitivity of the function on which we apply the DP mechanism. The privacypreserving dataset is generated by adding noise to each image as follows: Assuming a dataset of images, such that each image . Thus, each point of will be perturbed using the following equation:
(2) 
where , and is a noise drawn from the Gaussian distribution . The sensitivity is the difference between the maximum and the minimum value of . In our case, as we apply DP after normalizing the dataset. It is important to note that decreasing privacy leakage increases privacy protection. is equivalent to perfect privacy protection.
3 Experiments
The objective of this section is to evaluate the impact of DP and Hensel’s compression on the accuracy and the privacy protection. We developed a learning model, see Figure 3
, composed of two convolutional layers. Each layer is associated with ReLu as an activation function. The second convolutional layer is associated with Dropout Regularization to prevent overfitting. Then, we add three fully connected linear layers with the dimension of the output of the last linear layer is
, which corresponds to the number of classes that we have in our training dataset.We trained the model described above using different amounts of privacy leakage and different levels of data compression. Figure 4
shows samples of the different versions of the MNIST dataset used in training. Based on the dataset dimension, we divided these experiments into three scenarios:

Scenario 1: In this scenario, we train the learning model on the original MNIST dataset where the dimension of each image is . This is equivalent to of the data size.

Scenario 2: In this scenario, we train the learning model on the compressed MNIST dataset where the dimension of each image is . This is equivalent to of data size.

Scenario 3: In this scenario, we train the learning model on the compressed MNIST dataset where the dimension of each image is . This is equivalent to of data size.
In each scenario, we evaluated the impact of the privacy leakage on the accuracy. We considered three values of privacy leakage , , . Table 1 illustrates the experiment parameters of each scenario.
Scenario  Dimension  Data size 



Figure 5 illustrates the accuracy of the learning model in the first scenario. Overall, we get a high accuracy by only applying DP of the original MNIST dataset. The accuracy is higher than using the three values of privacy leakage, i.e., , , and . We notice that the accuracy decreases by decreasing the privacy leakage . This is because more noise is added to the images when decreases. Regarding the privacy protection, see subfigures 4a), b) and c) in scenario , we can still recognize what the real image contains even after adding large noise to the dataset (i.e., the case of which corresponds to Gaussian noise of variance , see subfigure 4c).
Figure 6 illustrates the accuracy of the learning model in the second scenario. In this scenario, we applied the two layers of privacy protection (i.e., Hensel’s compression and DP). We get a high accuracy even after increasing the privacy leakage. To be more specific, the learning model gives an accuracy of , and for the privacy leakage , and , respectively. For the privacy leakage , we get an accuracy of . Regarding the privacy leakage, we can see that it is hard to distinguish the content of images, especially for .
Figure 7 illustrates the accuracy of the learning model in the third scenario. In this scenario, images are compressed from to . Overall, we get a good accuracy compared to the level of privacy protection achieved. For example, in the first case where the privacy leakage , the learning model achieves an accuracy of while ensuring a perfect privacy protection. An attacker could not distinguish the images’ content even if the attacker succeeds to recover the training dataset. We notice that increasing the privacy leakage , the privacy protection increases while the accuracy decreases, specifically we get an accuracy of , and for , and , respectively.
To conclude, the accuracy and the privacy protection depends on the privacy leakage and level of data compression (i.e., Hensel’s compression). The proposed approach achieves an acceptable or high accuracy while ensuring strong privacy protection. Specifically, this good tradeoff is achieved in scenario (i.e., Hensel’s compression to dimension ) for , as well as in scenario (i.e., Hensel’s compresseion to dimension ) for .
It is important to note that of the data size (i.e., Hensel’s compression to dimension ) gives roughly the same accuracy as if the learning model is trained on of the data size. Thus, the proposed dimensionality reduction method not only strengthens privacy protection but also reduces the computational overhead. However, compressing too much the data will hide characteristics of images and hence decreases the accuracy. Thus, looking for the optimal tradeoff between the level of data compression and the privacy leakage that guarantees strong privacy protection while achieving a good accuracy is of great importance.
4 Conclusion
In this paper, we propose a two layers privacypreserving method for FL. The first layer reduces the dimension of the original training dataset based on Hensel’s compression, whereas the second layer applies DP on the compression dataset generated by the first layer. The experimental analysis validates the effectiveness of the proposed approach in protecting users’ privacy while achieving good accuracy. Experimental results show also that the learning model accuracy depends on the dataset compression and the DP privacy leakage .
References

Almuallim and Dietterich (1991)
Almuallim, H., Dietterich, T.G.,
1991.
Learning with many irrelevant features, in: Proceedings of the Ninth National Conference on Artificial Intelligence  Volume 2, AAAI Press. p. 547–552.
 Asoodeh et al. (2021) Asoodeh, S., Liao, J., Calmon, F.P., Kosut, O., Sankar, L., 2021. Three variants of differential privacy: Lossless conversion and applications. arXiv:2008.06529.

Chen (2003)
Chen, X.w., 2003.
An improved branch and bound algorithm for feature selection 24, 1925–1933.
doi:10.1016/S01678655(03)000205.  Cyffers and Bellet (2021) Cyffers, E., Bellet, A., 2021. Privacy amplification by decentralization. arXiv:2012.05326.
 Dong et al. (2019) Dong, J., Roth, A., Su, W.J., 2019. Gaussian differential privacy. arXiv:1905.02383.
 Dwork (2008) Dwork, C., 2008. Differential privacy: A survey of results, in: Agrawal, M., Du, D., Duan, Z., Li, A. (Eds.), Theory and Applications of Models of Computation, Springer Berlin Heidelberg, Berlin, Heidelberg. pp. 1–19.
 Dwork et al. (2006) Dwork, C., McSherry, F., Nissim, K., Smith, A., 2006. Calibrating noise to sensitivity in private data analysis, in: Proceedings of the Third Conference on Theory of Cryptography, SpringerVerlag, Berlin, Heidelberg. p. 265–284.
 Dwork and Roth (2014) Dwork, C., Roth, A., 2014. The algorithmic foundations of differential privacy 9, 211–407. URL: https://doi.org/10.1561/0400000042, doi:10.1561/0400000042.
 Fang et al. (2020) Fang, M., Cao, X., Jia, J., Gong, N., 2020. Local model poisoning attacks to byzantinerobust federated learning, in: 29th USENIX Security Symposium (USENIX Security 20), USENIX Association. pp. 1605–1622.
 Fung et al. (2020) Fung, C., Yoon, C.J.M., Beschastnikh, I., 2020. The limitations of federated learning in sybil settings, in: 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020), USENIX Association, San Sebastian. pp. 301–316.
 Gong et al. (2020) Gong, M., Feng, J., Xie, Y., 2020. Privacyenhanced multiparty deep learning. Neural Networks 121, 484–496. doi:https://doi.org/10.1016/j.neunet.2019.10.001.
 Kachouri et al. (2010) Kachouri, R., Djemal, K., Maaref, H., 2010. Adaptive feature selection for heterogeneous image databases, in: 2010 2nd International Conference on Image Processing Theory, Tools and Applications, pp. 26–31. doi:10.1109/IPTA.2010.5586751.
 Kim et al. (2021) Kim, M., Günlü, O., Schaefer, R.F., 2021. Federated learning with local differential privacy: Tradeoffs between privacy, utility, and communication, in: ICASSP 2021  2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2650–2654. doi:10.1109/ICASSP39728.2021.9413764.
 Kira and Rendell (1992) Kira, K., Rendell, L.A., 1992. The feature selection problem: Traditional methods and a new algorithm, in: Proceedings of the Tenth National Conference on Artificial Intelligence, AAAI Press. p. 129–134.
 Li et al. (2021) Li, Y., Zhou, Y., Jolfaei, A., Yu, D., Xu, G., Zheng, X., 2021. Privacypreserving federated learning framework based on chained secure multiparty computing. IEEE Internet of Things Journal 8, 6178–6186. doi:10.1109/JIOT.2020.3022911.
 Liu and Setiono (1996) Liu, H., Setiono, R., 1996. Feature selection and classification  a probabilistic wrapper approach, in: in Proceedings of the 9th International Conference on Industrial and Engineering Applications of AI and ES, pp. 419–424.
 Marill and Green (1963) Marill, T., Green, D., 1963. On the effectiveness of receptors in recognition systems. IEEE Transactions on Information Theory 9, 11–17. doi:10.1109/TIT.1963.1057810.
 McDonald (1974) McDonald, B.R., 1974. Finite rings with identity / Bernard R. McDonald. M. Dekker New York.
 Narendra and Fukunaga (1977) Narendra, Fukunaga, 1977. A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers C26, 917–922. doi:10.1109/TC.1977.1674939.
 Ouadrhiri and Abdelhadi (2022) Ouadrhiri, A.E., Abdelhadi, A., 2022. Differential privacy for deep and federated learning: A survey. IEEE Access 10, 22359–22380. doi:10.1109/ACCESS.2022.3151670.
 Peng et al. (2005) Peng, H., Long, F., Ding, C., 2005. Feature selection based on mutual information criteria of maxdependency, maxrelevance, and minredundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1226–1238. doi:10.1109/TPAMI.2005.159.
 Ren et al. (2021) Ren, H., Deng, J., Xie, X., 2021. Grnn: Generative regression neural network – a data leakage attack for federated learning. arXiv:2105.00529.
 Somol et al. (2004) Somol, P., Pudil, P., Kittler, J., 2004. Fast branch bound algorithms for optimal feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 900–912. doi:10.1109/TPAMI.2004.28.
 Tran et al. (2021) Tran, A.T., Luong, T.D., Karnjana, J., Huynh, V.N., 2021. An efficient approach for privacy preserving decentralized deep learning models based on secure multiparty computation. Neurocomputing 422, 245–262. doi:https://doi.org/10.1016/j.neucom.2020.10.014.
 Wei et al. (2021) Wei, K., Li, J., Ding, M., Ma, C., Su, H., Zhang, B., Poor, H.V., 2021. Userlevel privacypreserving federated learning: Analysis and performance optimization. IEEE Transactions on Mobile Computing , 1–1doi:10.1109/TMC.2021.3056991.
 Whitney (1971) Whitney, A., 1971. A direct method of nonparametric measurement selection. IEEE Transactions on Computers C20, 1100–1103. doi:10.1109/TC.1971.223410.
 Wu et al. (2020) Wu, H., Chen, C.Y., Wang, L., 2020. A theoretical perspective on differentially private federated multitask learning. ArXiv abs/2011.07179.
 Yin et al. (2021) Yin, L., Feng, J., Xun, H., Sun, Z., Cheng, X., 2021. A privacypreserving federated learning for multiparty data sharing in social iots. IEEE Transactions on Network Science and Engineering , 1–1doi:10.1109/TNSE.2021.3074185.
 Zhao et al. (2019) Zhao, J., Chen, Y., Zhang, W., 2019. Differential privacy preservation in deep learning: Challenges, opportunities and solutions. IEEE Access 7, 48901–48911. doi:10.1109/ACCESS.2019.2909559.