User-Device Authentication in Mobile Banking using APHEN for Paratuck2 Tensor Decomposition

05/23/2019 ∙ by Jeremy Charlier, et al. ∙ 0

The new financial European regulations such as PSD2 are changing the retail banking services. Noticeably, the monitoring of the personal expenses is now opened to other institutions than retail banks. Nonetheless, the retail banks are looking to leverage the user-device authentication on the mobile banking applications to enhance the personal financial advertisement. To address the profiling of the authentication, we rely on tensor decomposition, a higher dimensional analogue of matrix decomposition. We use Paratuck2, which expresses a tensor as a multiplication of matrices and diagonal tensors, because of the imbalance between the number of users and devices. We highlight why Paratuck2 is more appropriate in this case than the popular CP tensor decomposition, which decomposes a tensor as a sum of rank-one tensors. However, the computation of Paratuck2 is computational intensive. We propose a new APproximate HEssian-based Newton resolution algorithm, APHEN, capable of solving Paratuck2 more accurately and faster than the other popular approaches based on alternating least square or gradient descent. The results of Paratuck2 are used for the predictions of users' authentication with neural networks. We apply our method for the concrete case of targeting clients for financial advertising campaigns based on the authentication events generated by mobile banking applications.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Endorsed by the European objectives to promote the financial exchanges between the Euro members, a new financial regulation for Personal Finance Management (PFM) has entered into force in 2018. PFM is the monitoring of the revenues and the expenses of a bank account. PFM is achieved with the use of a Personal Finance Application (PFA), otherwise known as mobile banking application. The revised Payment Service Directive, PSD2, allows every person having a bank account to use a PFA from a third party provider to manage its personal finance, and thus transform the banks into simple vaults. Nevertheless, the retail banks now have the opportunity to leverage the user-device authentication on their own mobile banking application. Through the regular authentication, the bank create a financial profile awareness for every clients. They measure the frequency of the connections per day, the time of the connections and the type of the mobile devices used to authenticate such as a smartphone or a tablet. The more frequently a client is authenticating to its mobile banking application, the more likely he will have a high interest for finance, and therefore, the more likely he will be interested by financial recommendation. This client will be contacted in priority to advert financial products for wealth optimization. However, it is very common for the same person to use several mobile devices for the same application. Therefore, the devices generate dozens of authentication per day and create a strong imbalance between the number of users and the number of devices. Consequently, the modeling of the user-device authentication is multidimensional and complex. To answer this challenge, we rely on tensor decompositions, a higher order analogue of matrix decompositions, since they have proven to be powerful to model multidimensional interactions. Particularly, we address the modeling of the imbalanced user-device authentication with the Paratuck2 tensor decomposition, which decomposes a tensor as a multiplication of matrices and sparse tensors. We summarize the three main contributions of the paper as follows:

  • We have designed an innovative AProximate HEssian Newton minimization algorithm, APHEN, applied to a tensor decomposition, Paratuck2. The algorithm is able to reduce at a minimum the numerical errors while delivering state of the art performance. Additionally, APHEN does not require the full knowledge of the Hessian matrix to achieve its superior convergence.

  • We highlight the superior capabilities of APHEN and Hessian-based resolution algorithms for complex tensor decomposition such as Paratuck2 in comparison to other popular resolution schemes, mainly exclusively studied for the popular CP decomposition.

  • Additionally, we have applied the unique advantages of Paratuck2 for interactions modeling with imbalanced data. We justify with numerical simulation the use of Paratuck2 instead of the more popular CP tensor decomposition, which decomposes a tensor as a sum of rank-one tensors. In our application, one user can use several computers, and thus, we have considered the imbalance between the number of users and computers.

  • Finally, we have developed an approach with the Paratuck2 tensor decomposition and neural networks for authentication monitoring. Based on Paratuck2, the neurons predict the users’ authentication to estimate the future financial awareness of the clients. Therefore, the banks can better advertise their products by contacting the clients which are the more likely to be interested.

The remaining of this paper is organized as follows. Section 2 surveys the latest research publications related to user-device authentication and tensor decompositions. Section 3 describes the fundamentals of tensor decompositions, the APHEN algorithm and the other popular resolution algorithms. Additionally, it introduces a basic knowledge of neural networks and machine learning predictions. Section 4 illustrates the convergence speeds of APHEN in comparison to the other popular schemes. Then, Paratuck2 and neural networks are used to predict the imbalanced users’ authentication with the aim of improving the subscription rates to financial products during the banks’ advertising campaigns. Finally, we conclude the paper and we highlight pointers to future works in the last section.

Ii Related Work

Literature on User-Device Authentication The user-device authentication has significantly evolved for the past few years thanks to the new technologies. A reliable user-device authentication was proposed in [1], based on a graphical user friendly authentication. In [2], the use of Location Based Authentication (LBA) was studied. The development of recent embedded systems within smart-devices leads to new authentication processes which were considered as a pure fiction only few years ago. In [3]

, the usage of the embedded camera of smart-devices for authentication by face recognition was assessed. The face image taken by the camera of the mobile device was sent to a background server to perform the calculation which reverts then to the phone. In a similar approach, the use of iris recognition was proposed in

[4]. However, the authors showed this kind of authentication was not the preferred choice of the end user. Additionally, the sensors embedded into smart-devices allow other type of biometric authentication. In [5], the different biometric authentication that could be used with smart-devices were presented, such as the pulse-response, the fingerprint or even the ear shape. Although biometric or LBA solutions might offer a higher level of security for authentication, their extension toward a large scale usage is complex. In [6], the authors developed the idea that public-key infrastructure based systems, such as strong passwords in combination with physical tokens, for example, a cell phone, would be more likely to be used and largely deployed. Nonetheless, it is worth mentioning that the most common procedure for mobile devices authentication is still a code of four or six digits [7].

Literature on Tensor Decomposition The interactions modeling of the user-device authentication is multidimensional and complex, both specificities of tensor analysis and tensor decompositions. Tensor decompositions are an extension to higher dimensions of the two dimensional matrix decompositions [8, 9]. It is explained by the evolution towards more extensive analysis in the presence of an increasing number of features within the datasets. As a result, different tensor decompositions, or tensor factorizations, exist with different resolution algorithms for different types of applications [10, 11]. Meanwhile, the scope of the tensor application have skyrocketed. In [12], the CP tensor decomposition was used for data mining and signal processing. A low-rank approximation was developed for fast computation that could be used on large datasets. The CP tensor decomposition was also used in location-based social networks for identification profiling [13]. In the experiments, a certain number of anomalies were identified in the check-in behavior of the users. Following the trend of social network studies, the algorithm Tensorcat was specifically designed to study interactions on social networks [14]. Different sources were incorporated in coupled tensors for time-evolving networks such as Twitter to predict the evolution of the social network activity. Tensor predictive analytics have also been addressed in [15]. A tensor factorization method is described for spatial and temporal autocorrelations. The analysis and the predictions are demonstrated on traffic transporting data. Similarly, the Rescal tensor decomposition has been used in [16] for the review of spam detection. The approach highlighted the interactions between the reviewers and the products, and it led to a better accuracy of spam detection when compared to other methods. With similar objectives, a specific algorithm was developed in [17]

for heterogeneous data relying on the Higher-Order Singular Value Decomposition (HOSVD) to describe the frequencies of various signals. To answer to the real-world problems with the velocity of streaming data, multi-aspect streaming tensor completion is underlined in

[18]. Albeit the approach allows to build dynamic tensors, it relies on the CP tensor decomposition which lacks the linear independence of the latent variables in each order [19].

In this paper, we extend the state of the art of the tensor numerical resolution introduced for the CP decomposition in [11, 20] and [21] by proposing APHEN, an APproximate HEssian Newton resolution that does not require the complete knowledge of the Hessian matrix, and therefore removing the limitations of the computational cost of the Hessian matrix. APHEN is capable of minimizing at a minimum the numerical error at convergence while having similar or faster computation time than other popular resolution schemes. Additionally, we highlight experimentally the limitations of CP [19] and we propose the use of Paratuck2 for imbalanced data. Finally, we rely on neural networks for the predictions of the users’ authentication for personalized financial recommendation occurring during the advertising campaigns relying on mobile banking application connections.

Iii Model Description

In this section, we describe both the CP and the Paratuck2 tensor decompositions initially introduced in [8, 9] and [22]. Subsequently, we describe the error minimization algorithm APHEN, which is at the core of our contribution. Finally, we briefly describe neural networks applied to Paratuck2 for the aim of the latent users’ authentication predictions.

Iii-a CP and Paratuck2 Tensor Decompositions

Notation The terminology hereinafter follows the one described by Kolda and Bader in [10] and commonly used. Scalars are denoted by lower case letters, a

. Vectors and matrices are described by boldface lowercase letters and boldface capital letters, respectively

a and A. High order tensors are represented using Euler script notation, . The transpose matrix of is denoted by . The inverse of a matrix is denoted by .

Algebra Operations The outer product between two vectors, and is denoted by the symbol .

(1)

The Kronecker product between two matrices A and B, denoted by AB, results in a matrix C.

(2)

The Khatri-Rao product between two matrices A and B, denoted by AB, results in a matrix C of size . It is the column-wise Kronecker product.

(3)

Tensor Definition is called a n-way tensor if is a n-th multidimensional array. It is expressed as .

Tensor Operations The square root of the sum of all tensor entries squared of the tensor defines its norm.

(4)

The rank-R of a tensor is the number of linear components that could fit exactly.

(5)

Definition of the CP Decomposition We motivate the use of Paratuck2 over CP because of the imbalance in our dataset. Effectively, CP lacks the linear independence of the factors in each order [19]. The CP decomposition has been introduced in [8, 9]. The tensor is defined as a sum of rank-one tensor. The number of rank-one tensors is determined by the rank, denoted by , of the tensor . The CP decomposition is expressed as

(6)

where are vectors of size . Each vector with and refers to one dimension and one rank of the tensor .

Definition of the Paratuck2 Decomposition Paratuck2 has been introduced by Harshman and Lundy in [22]. The tensor is described as a product of matrices and tensors

(7)

where A, H and B are matrices of size , and . The matrices and are the slices of the tensors and . The latent factors and are related to the rank of each object set as illustrated in figure 1. More precisely, the columns of the matrices A and B represent the latent factors and . The matrix H underlines the asymmetry between the latent factors and the latent factors. The tensors and measures the evolution of the latent factors regarding the third dimension.

tikz/paratuck2

Figure 1: Paratuck2 decomposition of a three-way tensor with dimension notations

Iii-B APHEN and Approximate Derivatives

The Alternating Least Square (ALS) method is the most commonly used method for tensor resolution as initially described in [8, 9]. It has been applied by Bro in [23] to Paratuck2. Nonetheless, with larger data sets, the convergence performance of the ALS method decreases. To overcome this, algorithms facing gradient resolution for tensors have emerged [20, 19, 11]. However, it is well known that gradient descent schemes are very sensitive to the initial guess and the local minima. Therefore, we propose ApHeN, a resolution scheme that relies on the Newton conjugate gradient but does not require the knowledge of the complete Hessian matrix. Last but not least, the algorithm is applied to the Paratuck2 tensor decomposition illustrated in figure 1.

The objective minimization function is denoted by .

(8)

The tensor is the approximate tensor of built from the decomposition with matrices A, H and B initially randomized. The diagonal entries of the tensors and are set to 1 at the beginning of the minimization process.

The vector x is a flattened vector containing all the entries of involved in the decomposition scheme to build .

(9)

Using the notation from equation 9, we can derive the gradient and the Hessian matrix related to the CP decomposition.

The gradient, denoted by , is a vector containing all the first derivatives of the function with respect to x.

(10)

The Hessian matrix, Hes, is the matrix containing the second derivatives of the function with respect to x.

(11)

The Newton conjugate gradient algorithm minimizes the function according to the below equation.

(12)

The variable is the initial guess, the gradient of and Hes the Hessian matrix of . If the Hessian matrix is positive definite then the local minimum of the function is determined by setting the gradient of the quadratic form to zero.

(13)

Since the gradient and the Hessian matrix are computed with finite differences, the only prerequisite for Paratuck2 tensor decomposition is the factorization equation (7). Thus, the method can be transposed to other decompositions, such as CP, by merely changing the tensor decomposition equation. The approximate gradient is based on a fourth order formula (14) to ensure reliable approximation [24].

(14)

In formula 14, the index is the index of the variables for which the derivative is to be evaluated. The variable is the th unit vector. The term is the perturbation and it is fixed small enough to achieve the convergence of the iterative process.

Computing the exact inverse of the Hessian matrix arises numerical difficulties. However, as described by Wright et al. in [25], the Newton algorithm does not require a complete knowledge of the Hessian matrix. During the computation of the inverse of the Hessian matrix, the Hessian matrix is multiplied with a descent direction vector resulting in a vector. Therefore, only the results of the Hessian vector product is required. Using the Taylor expansion, this product is equal to the equation 15

(15)

with the perturbation and p the descent direction vector, fixed equal to the gradient at initialization. As a result, the extensive computation of the full Hessian matrix is bypassed using only the gradient. Finally, the complete ApHeN resolution scheme is presented in the algorithm 1.

Theoretical convergence rate APHEN is based on Newton’s iterative method but it relies on an approximation of the Hessian matrix instead of the exact Hessian matrix. The reason is that although the exact Newton’s method convergence is quadratic [25], the computation of the exact Hessian matrix is too time consuming for tensor application. Therefore, APHEN has a superlinear convergence such that

(16)

with the point of convergence, the search direction and the approximation of the Hessian matrix. Practically, the convergence rate is described the equation below.

(17)
1.25 Data: tensor , latent factors
Result: from tensor decomposition
1 begin
2       random initialization A random initialization H random initialization B set equal to 1 for set equal to 1 for x flatten(A, , H, , B) as described in (9) /* Error Minimization Loop */
3       repeat
4             gradient of at with (14) initial descent direction /* Search Direction CG Loop */
5             repeat
6                   /* update rules as described in [25] */
7                   CG method applied to to determine the search direction m = m + 1
8            until maximum number of iterations or stopping criteria Wolfe’s line search for optimal step size
9      until maximum number of iterations or stopping criteriareturn
10
Algorithm 1 ApHeN algorithm applied to Paratuck2 decomposition for a tensor of latent factors

Iii-C Latent Predictions on Paratuck2 Tensor Decomposition

Besides a Paratuck2 application of ApHeN for user-device authentication, our contribution resides in latent predictions. Following the tensor decomposition, latent variables are highlighted but modeled only past interactions. Hereinafter, the aim is to leverage past information to predict the users’ authentication. We briefly describe machine learning regression and neural networks used in our experiments.

Decision Trees (DT) are a widely used machine learning technique [26]. They are used to predict the value of a variable by learning simple decision rules from the data [27, 28]

. However, their regression decision rules have some limitations. Therefore, outpacing DT capabilities, neural networks including Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN) and Long-Short-Term-Memory (LSTM), and their applications, have skyrocketed for the past few years

[29]. MLP consists of at least three layers: one input layer, one output layer and one or more hidden layer [30]

. Each neuron of the hidden layer transforms the values of the previous layer with a non-linear activation function. Although MLP is applied in deep learning, it lacks the possibility of modeling short term and long term events. This feature is found in LSTM

[30]

. The LSTM has a memory block connected to the input gate and the output gate. The memory block is activated through a forget gate, resetting the memory information. However, for classification and computer vision, CNN is worth considering. In a CNN, the neurons are capable of extracting high order features in successive layers

[31]. Through proper classification, the CNN is able to detect and predict various tasks including activities recognition [32, 33].

Iv Experiments

First, we highlight the numerical advantages of APHEN in comparison to other popular numerical schemes. Secondly, we rely on APHEN for the interactions modeling of the user-device authentication and their predictions.

Iv-a APHEN vs Other Numerical Schemes

Hereinafter, we investigate the convergence behavior of APHEN in comparison to other numerical resolution methods. First, we define the concept of convergence rate and convergence speed. Then, we compare APHEN with 6 different algorithms applied to Paratuck2:

  • ALS, Alternating Least Squares [23, 34]

  • GD, Gradient Descent [25]

  • NAG, Nesterov Accelerated Gradient [35]

  • SAGA [36]

  • Adam [37]

  • BFGS [38, 39, 40, 41]

The simulations are conducted on a PC with an Intel Core i7 CPU and 16GB of RAM. All the resolution schemes have been implemented in Julia.

Convergence speed definition The definition of the convergence rate in 17 does not illustrate the time evolution between each iteration. Therefore, we define two notions: the iteration-based convergence speed and the time-based convergence speed. The convergence speed is defined as the absolute value of the linear slope, denoted by , of each curve shown afterward according to the equation . The bigger the convergence speed, the faster convergence and the lower the numerical errors in the tensor decomposition. Two types of convergence speed are characterized: the iteration-based convergence speed and the time-based convergence speed. The iteration-based convergence speed, and the time-based convergence speed, measures the evolution of the numerical errors according to the iterations, and to the time, respectively.

Numerical Convergence Highlights Seven tensor sizes have been defined, 555, 101010, 151010, 151515, 252015, 504020 and 10010020, with respective latent factors , , , , , and . The tensor dimensions and the latent factors have been chosen arbitrarily since the experiments have shown similar results for any tensor for any combination of latent factors. Each tensor entry is incremented by one in comparison to the previous entry, with the initial entry fixed to one. Additionally, for all the simulation, we define the convergence criterion such that .

Figure 2 highlights the convergence speeds of the different resolution schemes for a tensor of size 101010 with latent factors . ALS, BFGS and APHEN show significant superior convergence speeds, both iteration-based and time-based. The ALS scheme decreases the fastest at the beginning of the process but it fails rapidly to determine the solution having the lowest numerical errors. Although, APHEN has slightly longer computation time than ALS, it is the only method capable of determining the optimal solution. Surprisingly, all the gradient schemes have significantly lower convergence speeds including Adam.

Figure 3 highlights the accuracy and the execution time of the different resolution schemes for a tensor of size 151515. The accuracy is defined as . APHEN has the best accuracy followed by ALS, BFGS, Adam and the other schemes. BFGS and Adam have longer time of execution than APHEN for a significantly lower accuracy at convergence. ALS has a slightly faster execution time than APHEN but APHEN has a better accuracy at convergence.

These graphical results are completed by the tables I and II. The table I shows APHEN has the fastest convergence speeds in all the simulation followed closely by ALS, BFGS and Adam. The performance of the other schemes are significantly lower. Furthermore, the table II highlights the superiority of APHEN to determine the solution having the lowest numerical residual errors at the convergence of the calculation.

To summarize, we showed APHEN provides faster convergence speeds and lower residual errors for similar execution time. Thus, we use APHEN with Paratuck2 in our work.

Convergence Tensor Latent Grad.-Free Hessian-Free Hessian Approximation
Type Size Factors ALS GD NAG Adam SAGA BFGS APHEN
Iteration 555 23 0.0367 0.0001 0.0059 0.0002 0.0002 0.0322 0.0676
Iteration 101010 34 0.0231 0.0001 0.0028 0.0001 0.0042 0.0196 0.0490
Iteration 151010 54 0.0212 0.0001 0.0029 0.0001 0.0043 0.0238 0.0451
Iteration 151515 56 0.0135 0.0096 0.0001 0.0001 0.0001 0.0136 0.0268
Iteration 252015 109 0.0136 0.0045 0.0001 0.0109 0.0001 0.0146 0.0250
Iteration 504020 1514 0.0132 0.0001 0.0017 0.0001 0.0001 0.0137 0.0232
Iteration 10010020 35 0.0856 0.0001 0.0001 0.0001 0.0001 0.0913 0.1032
Time 555 23 0.2706 0.1543 0.029 0.1514 0.0335 0.2364 0.2883
Time 101010 34 0.0245 0.0157 0.0025 0.0152 0.0033 0.0245 0.0339
Time 151010 54 0.0146 0.0107 0.0001 0.0076 0.0017 0.013 0.0156
Time 151515 56 0.0055 0.0017 0.0001 0.0001 0.0005 0.0049 0.0058
Time 252015 109 0.0042 0.002 0.0001 0.0020 0.0001 0.0037 0.0044
Time 504020 1514 0.0020 0.0015 0.0002 0.0020 0.0002 0.0022 0.0033
Time 10010020 35 0.0013 0.0005 0.0001 0.0001 0.0001 0.0013 0.0015
Table I: Convergence speed highlights of the different resolution schemes (bigger is better).
Tensor Latent Grad.-Free Hessian-Free Hessian Approximation
Size Factors ALS GD NAG Adam SAGA BFGS APHEN
555 23 80.8952 73.3299 8.6596 75.0865 10.0273 79.2534 99.9999
101010 34 62.3542 50.3829 6.9655 62.7238 10.2185 66.3565 98.8941
151010 54 70.2395 64.4161 5.8585 65.3552 11.2226 66.0316 88.0582
151515 56 66.0539 62.1681 5.8687 59.9574 6.4757 61.6300 80.1714
252015 109 65.4691 43.6754 4.5421 44.1859 10.5425 57.0254 68.5696
504020 1514 72.3543 50.3830 6.3749 62.7238 7.2364 66.3566 87.9709
10010020 35 49.4730 38.7512 2.8462 48.1267 3.6195 49.3348 55.7678
Table II: Accuracy of the different resolution schemes at convergence (bigger is better).
Figure 2: Iteration-based and time-based convergences of the different numerical resolution schemes applied to Paratuck2 for a tensor of size 101010 with latent factors
Figure 3: Accuracy of each resolution method (left column) with their respective execution time (right column) at convergence applied to Paratuck2 for a tensor of size 151515 with latent factors

Iv-B User-Device Authentication Monitoring for Financial Recommendation

First, we discuss the completion and the resolution of the tensor. Secondly, we rely on the results of Paratuck2 for the predictions of the authentication with the neural networks.

User-Computer Authentication and Data Availability For the sake of the reproducibility of the experiments, we present the approach with a public data set. In 2014, the Los Alamos National Laboratory enterprise network published the anonymized user-computer authentication logs of their laboratory [42], and available at https://csr.lanl.gov/data/auth/. Each authentication event is composed of the authentication time (in Unix time), the computer label and the user label such as, for instance, ”1,U1,C1”. In total, more than 11,000 users and 22,000 computers are listed representing 13 GB of data.

Construction of the user-computer authentication tensor We randomly select 150 users and 300 computers within the dataset representing more than 60 millions lines. The first two months of authentication events have been compressed into 50 time intervals, corresponding to 25 working days per month. A tensor of size of 15030050 is built. The first dimension, denoted by , represents the users, the second dimension, denoted by , the computers and the last dimension, , stands for the time intervals.

Limitations of the CP decomposition The CP decomposition expresses the original tensor into a sum of rank one tensors. Therefore, the user-computer authentication tensor is decomposed as a sum of user-computer-time rank-one tensors. However, in the case of strong imbalance, CP leads to underfitting or overfitting one of the dimension [19]. Within the dataset, we can find 2 users that connect to at least 20 different computers. Therefore, a rank equal to 2, one per user, underfits the computer connections. A rank equal to 20, one per machine, overfits the number of users. In the table III, the underfitting is underlined by significant residual errors at convergence. The overfitting is detected by a good understanding of the data since the residual errors tend to be small. Hence, the Paratuck2 decomposition is chosen to model properly each dimension of the original tensor.

Tensor Size Rank Residual Errors
22030 2 50.275
22030 20 1.147
Table III: In CP, for imbalanced dataset, underfitting one dimension is highlighted by significant residual errors. Overfitting is difficult to measure because of the low residual errors. A good understanding of the data is required to estimate it.

Paratuck2 Tensor Resolution Paratuck2 decomposes the main tensor into a product of matrices and sparse tensors as shown in the figure 4. The matrix A factorizes the users into groups. We observe 15 different groups of users, and therefore, equals to 15. The sparse tensor reflects the temporal evolution of the connections of the users groups. The matrix H represents the asymmetry between the users groups and the computers groups. We notice 25 different groups of machines related to different authentication profiles, and consequently, equals to 25. The sparse tensor illustrates the temporal evolution of the connections of the computers groups. Finally, the matrix B factorizes the computers into latent groups of computers.

tikz/paratuck2_exp

Figure 4: Paratuck2 decomposition applied to user-computer authentication. The neural network predictions are performed on the tensor .

Latent Predictions for Financial Recommendation To achieve higher subscription rates during the advertising campaign of financial products, we explore the latent predictions for targeted recommendation based on the future user-computer authentication. The results of Paratuck2 contain the users’ temporal information and the computers’ temporal information in the sparse tensors and , respectively. Predicting the users’ authentication allows the banks to build a more complete financial awareness profile of their clients for optimized advertisement.

In figure 5, we highlight the results of the predictions of the users’ authentication for a specific group of clients, corresponding with one specific latent factor . Four different methods have been used for the predictions, DT, MLP, CNN and LSTM. All methods have been trained on a six weeks period. Then, the users’ authentication for the next two weeks are predicted with a rolling time window of one day. The figure 5

highlights visually that the LSTM models the most accurately the future users’ authentication. It is followed by the MLP, the DT, and finally the CNN. We underline this preliminary statement using six well-known error measures. The Mean Absolute Error (MAE), the Mean Directional Accuracy (MDA), the Pearson correlation, the Jaccard distance, the cosine similarity and the Root Mean Square Error (RMSE) are used to determine objectively the most accurate predictive method. The table

IV describes the error measures related to the figure 5. As previously seen, the LSTM is the closest to the true authentication since it has the lowest error values. Then, the MLP comes second, the DT third, and the CNN last.

To conclude, with the aim to better target the clients that might be interested by financial products during the bank’s advertising campaigns, we can conclude that LSTM combined with Paratuck2 models the best the future users’ authentication. As the majority of the user’s authentication are sequence-based, it is legitimate to find out LSTM gives the best results for the predictions. Effectively, each user has a recurrent pattern in the authentication process depending on its activities of the day. Therefore, by using APHEN for Paratuck2 and LSTM for predictions, the bank gain a very competitive advantage for the personalized products recommendation, based only on its clients’ authentication on the mobile application.

Figure 5: Two weeks prediction of the evolution of the latent users’ authentication according to the different models used
Error Measure DT MLP CNN LSTM
MAE 0.0965 0.0506 0.1106 0.0379
MDA 0.1579 0.7447 0.5263 0.6842
Pearson corr. 0.8537 0.9598 0.8885 0.9753
Jaccard dist. 0.2257 0.1206 0.2648 0.0911
cosine sim. 0.9587 0.9891 0.9745 0.9914
RMSE 0.1306 0.0695 0.3140 0.0477
Table IV: Latent predictions errors on the users’ authentication with decision tree and neural networks

V Conclusion And Future Work

In this paper, we presented an Hessian-based algorithm, APHEN, that does not require a full knowledge of the Hessian matrix. It was applied to resolution of the Paratuck2 tensor decomposition. APHEN reduces at a minimum the numerical errors inherited from the tensor decomposition. Furthermore, it has higher convergence speeds than other popular methods such as NAG or Adam. We used derivatives approximation evaluated with finite difference schemes to propose an accessible framework for all tensor decompositions. The experiments were conducted on tensors of different sizes with different latent factors. Additionally, we showcased an application in the context of mobile banking application. We used Paratuck2 and state of the art machine learning and neural networks to profile and predict the latent users’ authentication. By modeling the clients’ past and future authentication on their mobile application, the banks are able to build a financial awareness profile of their clients to advert different types of products. The banks have realized the promising potential of the clients’ digital behavior to face the increasing competition coming from the new regulation directives.

As future work, we plan on showing the versatility of APHEN to all tensor decompositions. We will compare APHEN’s performance for all existing tensor decomposition against the other existing tensor resolution algorithms specific to each tensor decomposition. Then, we will assess the influence of the line search and the performance of adaptive line searches while improving the GPU compatibility of the algorithm to increase the size of the experiments. Finally, the financial recommendation depending of the user-device authentication on a mobile banking application will be further extended. The navigation usage, the time gap between each action and the type of device used will be monitored to further improve the bank’s advertising campaigns of their products to the appropriate clients.

References

  • [1] G. Skinner, “Cyber security for younger demographics: A graphic based authentication and authorisation framework,” in Region 10 Conference (TENCON), 2016 IEEE.   IEEE, 2016.
  • [2] M. Adeka, K. O. Anoh, M. Ngala, S. Shepherd, E. Ibrahim, I. Elfergani, A. Hussaini, J. Rodriguez, and R. A. Abd-Alhameed, “Africa: cyber-security and its mutual impacts with computerisation, miniaturisation and location-based authentication,” 2017.
  • [3] H. Li and X. Zhu, “Face recognition technology research and implementation based on mobile phone system,” in Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 2016 12th International Conference on.   IEEE, 2016.
  • [4] N. Chhabra and R. Dutta, “Low quality iris detection in smart phone: A survey,” 2016.
  • [5] R. Spolaor, Q. Li, M. Monaro, M. Conti, L. Gamberini, and G. Sartori, “Biometric authentication methods on smartphones: A survey,” PsychNology Journal, vol. 14, no. 2-3, 2016.
  • [6] M. Theofanos, S. Garfinkel, and Y.-Y. Choong, “Secure and usable enterprise authentication: Lessons from the field,” IEEE Security & Privacy, vol. 14, no. 5, 2016.
  • [7] X. Bultel, J. Dreier, M. Giraud, M. Izaute, T. Kheyrkhah, P. Lafourcade, D. Lakhzoum, V. Marlin, and L. Motá, “Security analysis and psychological study of authentication methods with pin codes,” in IEEE 12th International Conference on Research Challenges in Information Science (RCIS 2018), 2018.
  • [8] R. A. Harshman, “Foundations of the parafac procedure: Models and conditions for an explanatory multimodal factor analysis,” 1970.
  • [9] J. D. Carroll and J.-J. Chang, “Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition,” Psychometrika, vol. 35, no. 3, 1970.
  • [10] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, 2009.
  • [11] E. Acar, T. G. Kolda, and D. M. Dunlavy, “All-at-once optimization for coupled matrix and tensor factorizations,” arXiv preprint arXiv:1105.3422, 2011.
  • [12] A. P. Da Silva, “Tensor techniques for signal processing: algorithms for canonical polyadic decomposition,” Ph.D. dissertation, Université Grenoble Alpes, 2016.
  • [13] E. Papalexakis, K. Pelechrinis, and C. Faloutsos, “Spotting misbehaviors in location-based social networks using tensors,” in Proceedings of the 23rd International Conference on World Wide Web.   ACM, 2014.
  • [14] M. R. de Araujo, P. M. P. Ribeiro, and C. Faloutsos, “Tensorcast: Forecasting with context using coupled tensors (best paper award),” in Data Mining (ICDM), 2017 IEEE International Conference on.   IEEE, 2017.
  • [15] K. Takeuchi, H. Kashima, and N. Ueda, “Autoregressive tensor factorization for spatio-temporal predictions,” in 2017 IEEE International Conference on Data Mining (ICDM).   IEEE, 2017.
  • [16] X. Wang, K. Liu, S. He, and J. Zhao, “Learning to represent review with tensor decomposition for spam detection.” in EMNLP, 2016.
  • [17] Y. Qiao, K. Niu, and Z. He, “Signal processing on heterogeneous network based on tensor decomposition,” in Network Infrastructure and Digital Content (IC-NIDC), 2016 IEEE International Conference on.   IEEE, 2016.
  • [18] Q. Song, X. Huang, H. Ge, J. Caverlee, and X. Hu, “Multi-aspect streaming tensor completion,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.   ACM, 2017.
  • [19] E. Acar, D. M. Dunlavy, and T. G. Kolda, “A scalable optimization approach for fitting canonical tensor decompositions,” Journal of Chemometrics, vol. 25, no. 2, 2011.
  • [20] P. Paatero, “A weighted non-negative least squares algorithm for three-way ’PARAFAC’ factor analysis,” Chemometrics and Intelligent Laboratory Systems, vol. 38, no. 2, 1997.
  • [21] G. Tomasi and R. Bro, “A comparison of algorithms for fitting the parafac model,” Computational Statistics & Data Analysis, vol. 50, no. 7, pp. 1700–1734, 2006.
  • [22] R. A. Harshman and M. E. Lundy, “Uniqueness proof for a family of models sharing features of tucker’s three-mode factor analysis and parafac/candecomp,” Psychometrika, vol. 61, no. 1, 1996.
  • [23] R. Bro, “Multi-way analysis in the food industry: models, algorithms, and applications,” Ph.D. dissertation, 1998.
  • [24] K. Schittkowski, “Nlpqlp: A new fortran implementation of a sequential quadratic programming algorithm for parallel computing,” Report, Department of Mathematics, University of Bayreuth, 2001.
  • [25] S. Wright and J. Nocedal, “Numerical optimization,” Springer Science, vol. 35, no. 67-68, 1999.
  • [26] R. Lior et al., Data mining with decision trees: theory and applications.   World scientific, 2014, vol. 81.
  • [27] T. Kim, Y. Yue, S. Taylor, and I. Matthews, “A decision tree framework for spatiotemporal sequence prediction,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.   ACM, 2015, pp. 577–586.
  • [28] L. Breiman, Classification and regression trees.   Routledge, 2017.
  • [29] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi, “A survey of deep neural network architectures and their applications,” Neurocomputing, vol. 234, pp. 11–26, 2017.
  • [30] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning.   MIT press Cambridge, 2016, vol. 1.
  • [31] D. H. Hubel and T. N. Wiesel, “Receptive fields of single neurones in the cat’s striate cortex,” The Journal of physiology, vol. 148, no. 3, 1959.
  • [32] O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on.   IEEE, 2012.
  • [33] Y. Zheng, Q. Liu, E. Chen, Y. Ge, and J. L. Zhao, “Time series classification using multi-channels deep convolutional neural networks,” in International Conference on Web-Age Information Management.   Springer, 2014.
  • [34] J. Charlier, R. State, and J. Hilger, “Non-negative paratuck2 tensor decomposition combined to lstm network for smart contracts profiling,” in Big Data and Smart Computing (BigComp), 2018 IEEE International Conference on.   IEEE, 2018, pp. 74–81.
  • [35] Y. Nesterov et al., “Gradient methods for minimizing composite objective function,” 2007.
  • [36] A. Defazio, F. Bach, and S. Lacoste-Julien, “Saga: A fast incremental gradient method with support for non-strongly convex composite objectives,” in Advances in neural information processing systems, 2014, pp. 1646–1654.
  • [37] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [38] C. G. Broyden, “The convergence of a class of double-rank minimization algorithms 1. general considerations,” IMA Journal of Applied Mathematics, vol. 6, no. 1, pp. 76–90, 1970.
  • [39] R. Fletcher, “A new approach to variable metric algorithms,” The computer journal, vol. 13, no. 3, pp. 317–322, 1970.
  • [40] D. Goldfarb, “A family of variable-metric methods derived by variational means,” Mathematics of computation, vol. 24, no. 109, pp. 23–26, 1970.
  • [41] D. F. Shanno, “Conditioning of quasi-newton methods for function minimization,” Mathematics of computation, vol. 24, no. 111, pp. 647–656, 1970.
  • [42] A. Hagberg, A. Kent, N. Lemons, and J. Neil, “Credential hopping in authentication graphs,” in 2014 International Conference on Signal-Image Technology Internet-Based Systems (SITIS).   IEEE Computer Society, Nov. 2014.