Data confidentiality has significant relevance in the financial field. The disclosure of sensitive information such as credit scoring or exchange apikeys, can lead to, e.g., learn about users’ credit risk —and therefore decide whether to lend money— or ultimately money theft. A data breach inevitably leads to serious consequences affecting the reputation of the financial institution.
The importance of cyber-security is even more crucial for the cryptocurrency industry, whose current valuation —$ trillion in 2021— attracts a rapidly increasing number of hackers. Only in the last two years, four large hacks occurred in the digital currency space111https://www.investopedia.com/news/largest-cryptocurrency-hacks-so-far-year/.
Companies are therefore pushing on equipping their platforms with state-of-the-art security mechanisms. There are widely accepted solutions such as storage encryption, transport-level encryption, intrusion detection systems, firewalls, IP white-listing, which are used and help in preventing common cyber-attacks. However, there are still open issues —especially in terms of data-in-use protection against privileged attackers (e.g., a malicious insider)— that requires more advanced approaches. Companies aim for methods where the customer does not need to trust them. There are technologies and cryptography schemes that can be leveraged for this purpose. In this regard, the adoption of hardware-assisted Trusted Execution Environments (TEE) (e.g., Intel SGX)  and Homomorphic Encryption (HE)  is making its way into the industrial community to ensure this kind of protection. HE is extremely powerful since it enables arbitrary computations on ciphered data and allows the user to trust no one. A TEE such as SGX provides also strong security guarantees enabling the execution of sensitive code in an isolated and measurable area of CPUs, with the only limitation that the user needs to trust the hardware manufacturers such as Intel and that the application’s state is sealed in the hardware . Unfortunately, the exclusive use of HE is not doable since it is affected by a non-negligible execution time overhead, and by a large cipher text expansion. Moreover, a major problem of HE is the so-called unverifiable conditional issue, which forces a program running on a third-party host, and processing homomorphically encrypted data, to request that a client decrypts intermediate functional results to proceed further in the execution . This introduces additional synchronization points and performance overhead, and increases the risk of information disclosure (e.g. via side channel analysis of the program workflow). For this reason, the trend for the majority of financial companies is to prefer TEEs.
The X-Margin Inc.222https://xmargin.io/ company, instead, uses both security mechanisms. It offers secure credit risk scoring by shielding computations in Intel SGX enclaves and performs HE-based Zero-Knowledge Proofs to offer cryptographic Proof of Security, Computation and Encrypted Input. In this paper, we explore an innovative approach for X-Margin which would allow to replace the use of SGX, thus removing Intel from the chain-of-trust and being hardware agnostic: Functional Encryption (FE) . This is a cryptographic scheme similar to HE that does both evaluation and decryption of the result of a function at the same time, without leaking the private key for data decryption and without leaking information about the plaintexts. This means that —unlike HE— there is no need anymore of doing synchronizations between the client and the computation entity for intermediate results. In a FE model, there are special “evaluation keys” that only allow the functional evaluation of ciphered data. An additional advantage given by the FE scheme is that it enables Attribute-based encryption (ABE) schemes, where the encrypted data is linked with a set of attributes and secret keys along with certain policies that allow to control which ciphertexts can be decrypted depending on the possessed attributes. Unlike the classical encryption systems, where “access to the encrypted data is all or nothing, one can either decrypt and read the entire plain or one learns nothing at all about the plain other than its length”, FE tries to change this paradigm allowing a more fine-grained access to encrypted data and improving its flexibility. We describe how FE can be leveraged for X-Margin
financial computations and —most importantly— evaluate the performance impact of this innovative scheme. We overview a solution that computes credit risk from an encrypted borrower’s data with the FE, thus preserving data privacy in a cloud environment even from the company itself. As a first step, we implemented an ad-hoc neural network for the purpose of credit risk scoring, which acts as a binary classifier saying whether a loan defaults. The training is conducted on a plain dataset. The resulting weight configuration downstream the training step is saved for the second step. We built the functional encryption scheme using the weights configuration obtained during the first step. We used a symmetric private key quadratic multivariate polynomial scheme called SGP
. As from the definition of functional encryption, the result of decryption is not the record itself, but the evaluation of a function of it, that is represented by the credit risk associated to the borrower. At the end, we computed the credit risk via two FE functions: one that computes the score to say whether the borrower defaults, and the other one that computes the score whether the borrower doesn’t default. Then, the final result is normalized among the scores in order to provide the probability of default. Results from experimental evaluation show that the FE entails a non-negligible overhead on the classification time. Using 20 attributes, the application took 17.3s to compute the score with 200 borrowers in the system. It is important to notice that the scoring classification job is highly parallelizable, thus performance can be improved using multiple nodes.
The rest of this paper is organized as follows: In Section 2, we provide a background on credit scoring and on the FE scheme. Afterwards, in Section 3, the X-Margin case study is presented. In Section 4, the architecture of the proposed solution is discussed. Section 5 presents results from the experimental evaluation. Finally, Section 6 concludes the document.
In this section, we provide a background on the two main elements covered in this work, i.e., credit scoring and functional encryption.
2.1 Principles of Credit Scoring
Credit scoring is a method that is historically used by banks to estimate the risk of lending money to an individual. It determines the ability of a person to repay the debts. The higher is the credit score, the higher is the probability of obtaining a loan with low interest rates. On the other hand, people with a lower credit score must pay higher interest rates on their loans. The process of giving a loan is influenced by many factors: characteristics of borrowers in terms of who they are, their economic situation, the amount of the requested loan, its purpose, and the collateral type. The risk is estimated using elements of quantitative and qualitative analysis. The quantitative analysis takes into account an assessment of the financial standing of customers. It could also consider cash flow analysis of customers’ accounts together with their credit history. While the qualitative assessment includes other information such as the education, employment, industry in which they operate, or the way of keeping accounts.
2.2 Fundamentals of Functional Encryption
Functional encryption (FE) is relatively a new encryption technique, which was introduced by Amit Sahai and Brent Water in 2008 [FE_pres].
FE enables selective access control of sensitive data basing on specific functions . In an FE scheme, a decryption key is associated with the function . Therefore, the decryption of an encrypted data through , provides the function evaluation , and nothing more about .
Concerning differences between FE and FHE, from one side they result similar, because once the message has been encrypted, both evaluate a function of the message. However, FHE suffers from to the additional step of decrypting the evaluation of the function, since the result of its computation is still an encrypted result. With FE instead, the result of the decryption already represents the evaluation of the function. FE has been successfully applied to cloud environments and to obfuscation mechanisms , that can be useful for example to protect intellectual property. Unfortunately, up to now, industries seldom adopt such advanced cryptography technologies: the majority of cryptography technologies used by industries were developed in early 2000s .
Only recently, thanks to the ambitious project FENTEC (Functional ENcryption TEChnologies) 333https://fentec.eu/, there has been a tentative to propose FE solutions for privacy preserving in a wide range of sectors from clinical data 444https://github.com/fentec-project/Selective-Access-to-Clinical-Data, to public transportation 555 https://github.com/fentec-project/FE-anonymous-heatmap.
There are several types of schemes that can be assimilated to the FE, these are: attribute-based encryption (ABE) , , , identity-based encryption (IBE) , the ones that implement inner product functions ,  and nonlinear (at most quadratic) polynomials. In this work, the focus is on a symmetric private key FE scheme called SGP, that implements a two dimensional quadratic polynomial function.
represents the vectors we want to encrypt, whileis a custom matrix. The result of decryption in this case will be the evaluation of the function above.
Regarding security, it provides an adaptive security under chosen-plaintext attacks (IND-CPA security) which is based on bilinear pairings, with the following primitives:
: generates a secret key for the SGP scheme.
: encrypts input vectors and with the secret key and returns the appropriate ciphertext .
: given in input a custom matrix and the secret key , derives the FE key .
: given in input the ciphertext the FE key and the custom matrix , gets the evaluation of that is .
3 The X-Margin Case Study
X-Margin Inc. provides privacy-preserving credit scoring in the crypto-currency credit market. With over $B of crypto-collateral being used to generate over $B of interest on a quarterly basis, credit is one of the most rapidly growing sectors of the emerging crypto-currency finance ecosystem. X-Margin allows borrowers to supply lenders with real-time portfolio risk metrics, while preserving the privacy of trades, positions, and other sensitive information. Borrowers benefit from improved lending terms, as they can display their risk in real-time and assure lenders they are trading responsibly. Lenders benefit from increased visibility and real-time information. X-Margin calculates a variety of Risk Metrics on each user’s portfolio, including Equity, Balance, Margin Usage, Maximum Loss (SPAN or VaR calculations), Aggregate absolute Delta, and
. The company uses Zero Knowledge technology to ensure that the credit scoring system is functioning in an unbiased and privacy preserving way, and as the users expect.
X-Margin leverages TEE technologies, along with HE-based cryptographic Proof of Security, Computation and Encrypted Input, to ensure the privacy of users’ sensitive data and guarantee risk analysis. Intel Software Guard Extensions (SGX), is the enabling TEE technology that is currently used to protect and attest the sensitive computations. SGX is a recent extension of modern CPUs, which ensures confidentiality even against super-privileged users running as root. X-Margin uses HE-based schemes for Proof of Computation including a Proof of Encrypted Input. Together, the proofs demonstrate that a given function being executed within the SGX enclave corresponds exactly to a symbolic representation stored in the server, outside the enclave —the untrusted ’host’— and that the data on which the function operates is received encrypted from an external source. X-Margin proves that the functions the system is instructed to run are run in the enclave, without revealing the functions themselves. The Proof of Computation & Encrypt ed Input uses the Enhanced Skyline protocol, which is a proprietary protocol developed by X-Margin’s team of cryptographers. Figure 1 shows the current architecture of the X-Margin solution. There is a SGX-based Key Management System (KMS) that holds users’ apikeys, and a CScore unit where the actual private computation takes place, protected by the SGX secure enclave, and proven by cryptographic proofs. It queries spot/derivative exchanges and provide risk score metrics to a Time Series Database (TSDB).
The goal of the company is to further improve their solution by raising the concept of trust to a higher level. In the current situation, in fact, its customers must trust the hardware manufacturer, i.e., Intel. It would be ideal for the company to remove also Intel from the chain-of-trust. Furthermore, in the current status, the credit scoring application is strictly bounded to a particular hardware where the SGX ISA extension —and the related motherboard support— is available. This also entails migration constraints because the application’s state is sealed in the hardware. X-Margin’s R&D is therefore exploring new innovative security techniques to cope with the mentioned limitations.
4 The FE-based Credit Scoring Solution
The functional encryption paradigm represents an opportunity to overcome limitations posed by the SGX-based credit scoring solution. It ensures security properties similar to HE but with the advantage that it allows to learn specific functions of the encrypted data. Overall, the FE-based credit scoring works as follows. In a first phase, the neural network is trained leveraging plain datasets in X-Margin’s premises. This is acceptable since the training process does not use any customers’ sensitive data but it is conducted on anonymous data. At the end of this learning phase, coefficients needed to compute the credit risk of a given borrower are obtained. These are then used to generate keys. In this regard, according to the FE scheme principles, a trusted key authority receives the coefficients and generates two types of keys, i.e.: public/private keys for the encryption/decryption of data, and operational keys needed to perform the functional computation on encrypted data, which can only be used to decrypt the result of the computation but not the data itself. At the end of this phase, the X-Margin credit scoring application is finally configured. From now on, the customer will encrypt the data along with the public keys received by the authority. The ciphered information is then sent to the untrusted X-Margin premises running on cloud. Here, the credit risk associated to the borrower is computed on ciphered data. It is important to notice that the computation of the credit risk associated to the borrower does not include any confidential borrower’s data. Only the result of this computation (i.e., the scoring evaluation) will be accessible to X-Margin.
Figure 2 shows the overall architecture of the FE-based credit scoring solution, together with the steps performed during normal functioning.
Data pre-processing: At startup, X-Margin performs a pre-processing activity on the data-set that will be used during the training phase. Modifications to the dataset are needed to adapt the subsequent computation to the FE requirements, and also to improve the accuracy of the neural network model, such as:
Removes columns and rows containing null values, discard columns with a high percentage of null values, as well as rows with null values inside.
Scales and normalizes columns.
Transform data in integer form in order to ensure compatibility with the FE scheme properties,
Neural Network Training: In this phase, the training phase takes place using clear data. This is acceptable given that the leveraged dataset contains anonymous entries. An ad-hoc neural network was defined. We leveraged and adapted the Adam optimization algorithm, with a learning rate of and a batch size of
. Training epochs have been set to. The neural network has a single hidden layer and its activation layer is represented by a square element-wise, instead of sigmoid. This choice will be better explained in the rest of this section.
Let be our input vector, that is the borrower’s record, be the weight matrix of the first layer and be the weight matrix of the hidden layer. In our case, , and . The output of the neural network is:
The result of the classification is represented by a pair of scores, since , which are mutually exclusive and that are not normalized. In order to get the final result in terms of probability, a softmax function has been applied in the untrusted context side.
At the end of the training step, and are saved and used afterwards for the encryption task and for deriving FE keys in the trusted context.
Secret Key Generation The trusted authority generates a secret key for the symmetric SGP scheme according to the FE scheme, that is:
It then sends to the X-Margin user.
Data Encryption The X-margin user uses to encrypt the borrower’s data and returns the appropriate ciphertext that is:
Here, respect to the classical SGP scheme, we have replaced the parameter by .
Encryption Projection The X-Margin user takes the encryption and weight matrix obtained at step 2 to produce a projection of , that is:
represents the encryption of , that is sent to the untrusted context.
Secret Key Projection The trusted authority takes the secret key obtained at step 3 and the weight matrix obtained at step 2 to get a projection of the secret key:
represents the secret key for encryption of .
FE Key Generation The trusted authority provides also the FE keys to the untrusted party to compute the credit risk of the borrower. There are two FE keys: one used to decrypt the score of default and one used to decrypt the score of not default of the borrower. Therefore, each FE key is used to decrypt a different output of the neural network:
accepts as parameter the secret key , obtained at the previous step, and the diagonal matrix , that is the diagonalized version of , that represents the row of the matrix . It should be noted that is the vector containing the weights connected to a single label.
Then, the trusted authority sends the obtained FE keys to the untrusted context.
Credit Score Computation The untrusted context, computes the credit risk scores of the borrower through the FE key provided by the trusted authority at the previous step. This means that for each label it evaluates the function of , that is . In order to do that, it exploits the the following decryption primitive of the SGP scheme:
For each label , accepts , that as we saw represents the encryption of sent by the X-Margin user, the derived FE key and the diagonal matrix sent by the trusted authority.
Since , there are two functions: one that computes the score that the borrower defaults and one that he/she doesn’t default. Such scores can be seen as mutually exclusive, therefore we can apply the soft max function on them in order to normalize the results and obtain the related probabilities.
Finally, the untrusted context sends back to the X-Margin user the score obtained at 8.
4.2 Details on the FE Scheme for Credit Risk Scoring
The aim of this section is to draw connections between the neural network output function provided at 2 and the FE function at 1, so as to understand what is the information needed to build the FE scheme.
Given the FE function at 1, we can transform it from a multivariate to a single variable function.
From the other side, we can see the neural network function at 2 as the the result obtained for each single label:
Where is a vector containing the weights connected to a single label , is the borrower’s vector and is the weight matrix associated to the first layer. Applying the following manipulation to each function obtained in 10, it is possible to get the FE function at 9:
Here, we have transformed the first matrix product, switching the operands and applying the transpose on them. Indeed, we have removed the square element-wise operation on it and, in order to keep the equality we have replaced with its diagonalized version and we have multiplied by the product . Finally, we have replaced the products with the obtained vector in order to highlight the equivalence with the 9. The custom matrix is the diagonal matrix that is different for each label, while the other parts of the function don’t change.
We have shown that such neural network architecture computes the FE function 9, for each label . Each function has, as variable, the matrix product named , and produces as output respectively the score that a borrower defaults and not. From now on will be indicated without the transpose operations inside, in order to not burden the discussion.
5 Experimental Performance Evaluation
We conducted an experimental evaluation of the privacy-preserving scoring solution. The goal was to estimate the performance overhead given by the adoption of the FE-based processing. More precisely, our focus was on the execution time needed to encrypt borrowers’ data, and to compute the credit risk score, i.e., the evaluation of the functional decryption occurring on X-Margin premises. These tests did not take into account the classification algorithm accuracy since our main goal is to focus on the sole performance.
We have evaluated our model on a Windows 10 machine equipped with an AMD Ryzen 3600X with 6-Core Processor at 3.80 GHz and 16GB of RAM. Execution times (in ms) are the average of 5 repeated tests.
As a first experiment, we evaluated the encryption phase (i.e., steps 4 & 5) by increasing the number of users in the system up to 1000 borrowers, and also varying the number of borrowers’ attributes (in the range [5,25]).
Figure 3 shows the outcomes of the evaluation. It can be noticed that the encryption is not significantly time consuming. The encryption of one borrower in what we defined the worst situation —i.e., characterized by 50 attributes— takes on average , while in the best situation —i.e., 5 attributes— it takes . The execution time increases linearly with the number of borrowers.
The second evaluation we made refers to the FE-based scoring computation time, which is critical for the actual usability of the proposed solution. It is important to notice that this operation includes both the actual functional computation and the final result decryption. Figure 3 shows that the overhead is significant, reaching in the worst condition —i.e., 50 attributes and 1000 users— a computation time of . It must be noticed that in real situations, the number of attributes is lower, in the order of 20. Nevertheless, performance are still not acceptable. However, the type of job we implemented with FE is highly parallelizable, and the deployment on multiple nodes can make the adoption of the FE-based credit scoring solution acceptable.
In this paper, we presented an innovative solution for privacy-preserving credit risk scoring, which leverages the emerging Functional Encryption cryptography that allows to learn the result of a specific function using only encrypted data. We built a credit risk model using the quadratic polynomial FE symmetric scheme SGP for the X-Margin Inc. company, whose case study was used to validate the effectiveness of the proposed approach. Results from the experimental campaign show that the solution performs well under certain conditions, but the overhead needs to be properly managed in case of a real production deployment. We discovered that with a limited number of borrowers’ attributes (e.g. ) the encryption time and the classification time are acceptable. Future developments of the current solution will be focused on improving performance by evaluating new public key quadratic schemes (e.g., ), and setting up a multi-node parallel architecture.
-  (2015) Simple functional encryption schemes for inner products. In Public-Key Cryptography – PKC 2015, J. Katz (Ed.), Berlin, Heidelberg, pp. 733–751. External Links: Cited by: §2.2.
-  (2018) Multi-input functional encryption for inner products: function-hiding realizations and constructions without pairings. In Advances in Cryptology – CRYPTO 2018, H. Shacham and A. Boldyreva (Eds.), Cham, pp. 597–627. External Links: Cited by: §2.2.
-  (2017) FAME: fast attribute-based message encryption. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS ’17, New York, NY, USA, pp. 665–682. External Links: Cited by: §2.2.
-  (2004) Efficient selective-id secure identity-based encryption without random oracles. In Advances in Cryptology - EUROCRYPT 2004, C. Cachin and J. L. Camenisch (Eds.), Berlin, Heidelberg, pp. 223–238. External Links: Cited by: §2.2.
-  (2011) Functional encryption: definitions and challenges. In Theory of Cryptography, Y. Ishai (Ed.), Berlin, Heidelberg, pp. 253–273. External Links: Cited by: §1.
-  (2017) Cloudifying critical applications: a use case from the power grid domain. In 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), Vol. , pp. 363–370. External Links: Cited by: §1.
-  (2020) VISE: combining intel sgx and homomorphic encryption for cloud industrial control systems. IEEE Transactions on Computers (), pp. 1–1. External Links: Cited by: §1.
-  (2018) Reading in the dark: classifying encrypted digits with functional encryption. Note: Cryptology ePrint Archive, Report 2018/206https://eprint.iacr.org/2018/206 Cited by: §1.
-  (2016) Candidate indistinguishability obfuscation and functional encryption for all circuits. SIAM Journal on Computing 45 (3), pp. 882–929. External Links: Cited by: §2.2.
-  (2013) Attribute-based encryption for circuits from multilinear maps. In Advances in Cryptology – CRYPTO 2013, R. Canetti and J. A. Garay (Eds.), Berlin, Heidelberg, pp. 479–499. External Links: Cited by: §2.2.
-  (2020) A new paradigm for public-key functional encryption for degree-2 polynomials. In Public-Key Cryptography – PKC 2020, A. Kiayias, M. Kohlweiss, P. Wallden, and V. Zikas (Eds.), Cham, pp. 95–120. External Links: Cited by: §6.
Fully homomorphic encryption using ideal lattices.
Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, STOC ’09, New York, NY, USA, pp. 169–178. External Links: Cited by: §1.
-  (2006) Attribute-based encryption for fine-grained access control of encrypted data. In Proceedings of the 13th ACM Conference on Computer and Communications Security, CCS ’06, New York, NY, USA, pp. 89–98. External Links: Cited by: §2.2.
-  (2017) A Few Thoughts on Cryptographic Engineering. Note: . https://blog.cryptographyengineering.com/2017/07/02/beyond-public-key-encryption/ Cited by: §2.2.
-  (2013) Innovative instructions and software model for isolated execution. In Proceedings of the 2Nd International Workshop on Hardware and Architectural Support for Security and Privacy, HASP ’13, New York, NY, USA, pp. 10:1–10:1. External Links: Cited by: §1.