Leveraging Siamese Networks for One-Shot Intrusion Detection Model

06/27/2020 ∙ by Hanan Hindy, et al. ∙ 0

The use of supervised Machine Learning (ML) to enhance Intrusion Detection Systems has been the subject of significant research. Supervised ML is based upon learning by example, demanding significant volumes of representative instances for effective training and the need to re-train the model for every unseen cyber-attack class. However, retraining the models in-situ renders the network susceptible to attacks owing to the time-window required to acquire a sufficient volume of data. Although anomaly detection systems provide a coarse-grained defence against unseen attacks, these approaches are significantly less accurate and suffer from high false-positive rates. Here, a complementary approach referred to as 'One-Shot Learning', whereby a limited number of examples of a new attack-class is used to identify a new attack-class (out of many) is detailed. The model grants a new cyber-attack classification without retraining. A Siamese Network is trained to differentiate between classes based on pairs similarities, rather than features, allowing to identify new and previously unseen attacks. The performance of a pre-trained model to classify attack-classes based only on one example is evaluated using three datasets. Results confirm the adaptability of the model in classifying unseen attacks and the trade-off between performance and the need for distinctive class representation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 8

page 9

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Intrusion Detection System (IDS) development has its roots in statistical models [28], and has recently evolved to the use of Machine Learning (ML) [5] based on hybrid models and adaptive techniques [19]. Developments to date have highlighted two fundamental considerations in the design of effective supervised ML-based IDS; (a) availability of a large and representative historian of cyber-attacks consisting of many thousands of instances [23] and (b) the time window resulting from the need to retrain models after the emergence of a new attack class has been recorded, renders the network open to damaging attacks. Supervised ML models are very accurate at identifying cyber-attacks previously been trained to recognise, but significantly under-perform for new unseen and ‘zero-day’ attacks that emerge. Anomaly detection approaches have been explored to address the issue and whilst these schemes provide better performance against unseen attacks, their efficacy is inferior against known attacks when compared to supervised ML approaches. Further, anomaly-based approaches are also limited under multiple new attacks scenarios as they are simply classified into the same anomalous group, in so doing restricting the range of attack-specific countermeasures that can be employed.

Here, the development and evaluation of an ML-enabled approach that provides improved attack identification in the period between a range of previously unseen attacks at onset is reported and the deployment of a robust supervised ML model that informs on the most effective countermeasures. The methodology - referred to as One-Shot Learning - centres on the use of a Siamese Network, shown to be effective in identifying new classes based on one (or only a few) examples of a new class. An alternative approach is to create synthetic examples based on the domain knowledge of new attacks; however, this is challenging requiring a considerable amount of time to replicate a suitable representation of an environment with appropriate parameters, and is consequently subject to human error owing to cognitive biases.

One-Shot Learning was inspired by the generalisation learning ability of human beings. As discussed by Vinyals et al. [41]

, “Humans learn new concepts with very little supervision, yet our best deep learning systems need hundreds or thousands of examples” 

[41]. Therefore, One-Shot learning models aim at classifying previously unseen classes using one instance. The idea is to rely on previously seen classes and learn patterns and similarities instead of fitting the ML model to fixed classes. Few-Shot (N-Shot) learning is similar to One-Shot learning with a flexibility of using a few (N) instances to classify a class instead of one [36].

A Siamese Network is a network composed of two “twin” networks that are trained simultaneously to learn the similarity of two instances called a pair. Leveraging this similarity-based learning, a previously unseen class could be added to the network without retraining. The initial stage of the development is the training phase. The Siamese Network is trained using similarities that discriminate between classes; benign traffic and the classes of known cyber-attacks. Any new traffic instance is then compared against all known classes (used during training) plus an additional class ( classes) where only a limited number of examples of class ‘’ are available, such as might be the case on the appearance of a new cyber-attack. This is achieved without any form of additional training.

The contributions of the paper are; (a) the use of a Siamese Network model to successfully classify cyber-attacks based on pair similarities, not proposed for Cyber Security usage to date. (b)  evaluation of the proposed model performance to detect a new cyber-attack class based on one labelled instance without re-training. (c) comparison of the impact of a few labelled instances of the new attack class on detection performance.

The remainder of the paper is organised as follows; Section 2 details the main features of Siamese Networks; Section 3 presents the methodology governing the training of the Siamese Network and its evaluation is explained showing the potential of the network to identify a new attack class based on a few (previously collected and labelled) examples of that attack class without retraining. Section 4 presents the properties of the data sets and their corresponding attack classes used in model development and performance evaluation; ; the performance of the model is assessed in Section 5; conclusions are drawn in Section 6.

2 Background

In supervised machine learning, a relationship exists between model complexity and the volume of training data; too few training examples and the model will over-fit, resulting in an unnecessarily complex model that produces poor results. Therefore, securing sufficient and representative data is a limiting factor in model development and performance [20]. In practice, accessing and/or generating sufficiently large and representative training examples is a complex challenge and may involve significant manual effort and processing time [29]. Nonetheless, there are publicly available data sets for training IDS systems, notably the CICIDS2017 and the NSL-KDD sets. These data are used to pre-train the Siamese Network, subsequently, in the evaluation of the performance of the model in identifying a new class of attack after a limited number of that class’ samples has been recorded.

An alternative approach is to utilise ‘Transfer Learning’ to mitigate the need for large volumes of training data [26, 44]

. The premise of Transfer Learning to solve the target problem

(where data are limited), is to create a model for a similar problem where large amounts of data are readily available. The initial model is then ‘transferred’ to the target problem and partially re-trained on the small data set. The rationale is that the initial training on , yields training weights which discover features useful for the problem domain and hence applicable to the target problem ; hence after retraining, the model learns and generalises faster on the small data set [43]. Transfer Learning is a common approach in the image processing domain [14]

where for example, models are trained on the ImageNet data set 

[25, 30, 24]. Despite the potential of Transfer Learning as a viable solution, it does not eliminate the need for retraining.

One-Shot learning, first reported by Li Fei-Fei et al. [12], is inspired by human generalisation learning and has been applied in multiple domains with the most prominent being image and video processing [42, 45, 46]. It has also been used in other domains, such as robotics [4], language processing  [49, 48] and drug discovery [1]. Based on the literature, the Siamese Network is the most frequently used. Various architectures have been proposed and assessed as the building block for the twin network (i.e., CNN [8, 9], RNN [39] and GNN [16]). Matching Networks [41], Prototypical Networks [35]

, Imitation Learning 

[11]

and Autoencoders 

[15], particularly in the image processing domain, but amenable to be generalised to other domains. To the best of the authors’ knowledge, the development reported here is the first proposing a One-Shot IDS model implementation.

2.1 Siamese Network Architecture

Siamese Networks were first introduced by Bromley et al. [3] in the 90s to solve the problem of matching hand-written signatures, subsequently adapted to other domains. Popular implementations of Siamese Networks for image and video processing are presented by Koch et al. [22], Yao et al. [47] and Varior et al. [40]

. Moreover, it has been implemented for Natural Language Processing (NLP) tasks 

[2, 50] and for the retrieval of similar questions [10].

Figure 1 depicts the Siamese network architecture. As shown, the network is composed of two identical sub-networks that share weights. Twin networks pass their output to a similarity module, which in turn is responsible for calculating the distance defining “how alike” the two inputs are. The output is compared to the given similarity (i.e. whether or not the pair are similar), the loss is calculated, and the weights are then adjusted.

Fig. 1: Siamese Network Architecture.

Formally [22, 31], given a pair of inputs and a twin network , such that is the input of and is the input of , the similarity can be computed using Euclidean distance (equation 1):

(1)

such that and are the outputs of Networks and respectively since and are twin networks. Ultimately, the training goal is to minimise the overall loss as defined in Equation 2; for each given batch of input pairs

and label vector

, such that if and belong to the same class and otherwise.

(2)

such that is a regularisation parameter.

However, the loss function is sensitive to outliers (i.e. dissimilar pairs with large distances) which disproportionately affect the gradient estimation. An alternative loss function is the constructive loss shown in equation 

3 proposed by Chopra, Hadsell and LeCun [7, 17]. The constructive loss caps the contribution of dissimilar pairs if the distance is within a specified margin  [17], hence limiting the effect of large distances.

(3)

such that is a margin. In this study, the margin was set to  [17].

After training, given any two pairs, the network is capable of calculating their degree of similarity, , mirror the degree of similarity for the pair; the lower the , the closer the pair. Batches of pairs are used to train the network. Note, however, that an equal number of similar and dissimilar pairs are used in the batch.

Here, Feed-forward Artificial Neural Networks (ANN) are used as the building block of the twin network. The details of the architecture (i.e., the number of layers, neurons, etc.) are provided in Section 

3.

3 Siamese Network Model

In this section, the proposed Siamese Network model is used as the One-Shot learning architecture. The performance of the network on classifying a new cyber-attack class without the need to retrain is evaluated with the new attack class represented by a limited number of labelled samples.

Fig. 2: Siamese Network for Intrusion Detection System (One-Shot).

Figure 2 shows the process of establishing the intrusion detection model based on one-shot learning and illustrates the methodology of assessing performance for new attack classes without retraining the model.

Given a data set with classes, first, an attack class is chosen to act as the new cyber-attack; this class is excluded from the training process (Figure 2-(1)). Second, for the remaining classes after excluding ( classes), each class instances are split into two, as shown in Figure 2-(2). Collectively, the first ‘half’ is used as a pool of instances to generate the training set pairs both similar and dissimilar, as shown in Figure 2-(4); the second ‘half’ is used as the evaluation pool of instances.

Class is used to mimic a real-life situation in which a new attack is detected with only a few labelled samples available. Therefore, the instances of are split in two halves (Figure 2-(3)), the first half representing a pool of labelled and the second half a pool of unlabelled (new) instances.

Since the model relies on random pair generation, pairs are drawn out randomly from the pools of instances. The rational for having pools of instances and to draw out pairs randomly is to hinder any selection bias either during training (i.e. selecting similar and dissimilar pairs) or during evaluation of the new class (i.e. selecting the labelled instances that best represent this class). Furthermore, the uniqueness of the pairs - no duplicates - is ensured. A “set” data structure is used. it is added to the batch of pairs unless that pair is already contained within the set. This is demonstrated in Algorithm 2.

During evaluation, an instance is paired with one random instance from each class. The instances are drawn out of the pool of testing instances, resulting in pairs. The similarity is then calculated for the pairs. Instance is classified (labelled) based on the pair with the highest similarity (i.e. least distance).

As discussed in Section 5, to determine the trade-off between the number of labelled instances of the new attack class and accuracy, the process is repeated times for each instance . Majority voting is then applied to deduce the instance label; the class with the highest votes is used as instance label (Figure 2-(7)).

Input: Attacks Dataset
      Output: Trained Siamese Network Evaluation

1:
2:
3:
4: random class s.th.
5:
6:
7:
8: GetTrainingBatch()
9:Build Siamese Network with Random Weights
10:for  to  do
11:     Update Siamese Network Weights based on
12:end for
13:Evaluate(test_batch_size)
Algorithm 1 Train and Test Siamese Network

Input: Dataset of classes, Batch Size
      Output: Batch of similar and dissimilar pairs
      and associated labels (0: dissimilar, 1: similar)

1:function GetTrainingBatch(batch_size)
2:     
3:     
4:     
5:              
6:     
7:     
8:              
9:     
10:     for  in  do
11:         for  to  do
12:              ( 2 random instances
13:              if ( then
14:                  go to 10
15:              end if
16:              
17:              
18:         end for
19:     end for
20:     for  in  do
21:         for  to
22:               do
23:               random instance
24:               random instance
25:              if ( then
26:                  go to 20
27:              end if
28:              
29:              
30:         end for
31:     end for
32:      Similar
33:      Dissimilar
34:     return ,
35:end function
Algorithm 2 Generate Training Batch

Algorithm 1 summarises the overall process of training and testing the model. Initially, the data set is split as shown in Figure 2

. The model is trained for a specified number of epochs with the generated batch of pairs as described in Algorithm 

2. The is based on the literature recommendation for the advisable Siamese Network training batch size [27, 22, 33]. It is important to note that the classes are equally represented in both the training and testing batches. Note that the data set should have at least 3 classes, otherwise, the model converges to a 50% similarity output and fails to train adequately. Algorithm 2 shows the training batch generation process.

An equal number of instances are used from each class for evaluation (Algorithm 3). For each new instance, a pair is selected with each class using the new instance and a random instance from each class. The similarity is calculated for each pair. The pair with the closest similarity contributes to the classification result. The process is performed times and majority voting is used to collate the results (). For class (the attack class that is excluded from training), the first half acts as the pool of labelled and the second half act as the pool of new unlabelled instances.

Input: Trained Siamese Network, Batch Size, Excluded Class ()
      Output: Accuracy

Algorithm 3 Evaluate Model
1:function Evaluate(batch_size)
2:     
3:     
4:     for  in  do
5:         for  to  do
6:              for  to  do
7:                  if  then
8:                       
9:                           random instance
10:                  else
11:                       
12:                           random instance
13:                  end if
14:                  
15:                           random instance
16:                  
17:                           random instance
18:                  
19:                  
20:              end for
21:              if  then
22:                  
23:              end if
24:              
25:         end for
26:     end for
27:     
28:     return
29:end function

The model evaluation yields a Confusion Matrix (CM) that visualises the performance. A sample CM is presented in Table 

I. Each row of the CM represents a class; True Positive (TP) is the number of attack instances correctly classified as attack; True Negative (TN) is the number of normal instances correctly classified as normal; False Positive (FP) is the number of normal instances wrongly classified as attack; False Negative (FN) is the number of attack instances wrongly classified as normal.

Predicted Class
Correct Normal Attack1 Attack2 Attack3 Attack4
Normal TN FP1 FP2 FP3 FP4
Attack1 FN1 TP11 TP12 TP13 TP14
Attack2 FN2 TP21 TP22 TP23 TP24
Attack3 FN3 TP31 TP32 TP33 TP34
Attack4 FN4 TP41 TP42 TP43 TP44
TABLE I: Sample Confusion Matrix

The overall accuracy is calculated as shown in Equation 4. True Positive Rate (TPR) and False Negative Rate (FPR) for each class are shown in Equation 5 and Equation 6 respectively; finally, True Negative Rate (TNR) and False Positive Rate (FPR) are calculated using Equation 7 and Equation 8 respectively.

(4)
(5)
(6)
(7)
(8)

4 Datasets

Three data sets are used to evaluate the proposed models; two benchmark IDS data sets, specifically, CICIDS2017 and NSL-KDD and KDD Cup’99. The latter is used in comparison to the NSL-KDD to demonstrate the effectiveness of clean data when generating training pairs and also, when introducing new attacks to the trained model.

Each data set contains classes. classes are used to train the network, such that . The classes include normal/benign and attack classes. The instances of each of the class act as a pool used to generate similar and dissimilar pairs. Furthermore, one class is used to simulate a new attack, mimicking the situations in which little/limited data is available for a new attack. The pair generation details and the experiments are further discussed in Section 3.

An overview of each data set is presented in the following subsections.

4.1 Cicidss2017

CICIDS2017  [32] is a recent data set generated by the Canadian Institute for Cyber-security (CIC) comprising up-to-date benign, insider and outsider attacks. Traffic flows were generated and labelled using the provided ‘.pcap’ files. Table II lists the attacks used and the number of instances/flows for each.

Class # of Occurrences
1 Normal 248607 (90.50%)
2 DoS (Hulk) 14427 (5.25%)
3 DoS (Slowloris) 2840 (1.03%)
4 FTP Brute Force 5228 (1.9%)
5 SSH Brute Force 3627 (1.32%)
TABLE II: CICIDS Classes and Corresponding Number of Occurrences (1)

4.2 KDD Cup’99

The KDD Cup’99 [18], although old, is still considered as the classic benchmark data set used in the evaluation of IDS performance. More than 60% of the research in the past decade (2008 - 2018) has been evaluated using KDD’99 [19]. KDD Cup’99 covers 4 attack classes alongside normal activity. The attacks contained in the data set are; Denial of Service (DoS), Root to Local (R2L), User to Root (U2R) and probing.

The KDD Cup’99 data set is relatively large, however, the provider has made available a reduced subset of ~10% [21]. For the purposes of evaluation here, only the smaller subset is used. Table III shows the number of instances per class for the KDD Cup’99 data set.

Class # of Occurrences
1 Normal 97278 (19.70%)
2 DoS 391458 (79.24%)
3 Probe 4107 (0.82%)
4 U2R 1128 (0.23%)
5 R2L 52 (0.01%)
TABLE III: KDD Cup’99 Classes and Corresponding Number of Occurrences

4.3 Nsl-Kdd

The NSL-KDD [13] data set was proposed by the CIC to overcome the problems of the KDD Cup’99 set discussed by Tavallaee et al. [37]. Similar to KDD Cup’99, NSL-KDD covers 4 attack classes alongside normal activity. NSL-KDD is used for evaluating the effect of enhancing and filtering a data set on the similarity learning and performance. Table IV shows the number of instances per class for the NSL-KDD data set.

Class # of Occurrences
1 Normal 67343 (53.46%)
2 DoS 45927 (36.47%)
3 Probe 11656 (9.25%)
4 U2R 995 (0.78%)
5 R2L 52 (0.04%)
TABLE IV: NSL-KDD Classes and Corresponding Number of Occurrences

NSL-KDD and KDD Cup’99 data sets have already been pre-processed and 42 features extracted, a total of 118 features after encoding the categorical features. For the CICIDS2017, 31 bidirectional flow features are extracted. It is worth noting that no feature engineering or selection is performed to ensure that the excluded class from training does not indirectly influence the feature set.

Recent surveys examined the use of ML for IDS [6]. Furthermore, Thomas and Pavithran [38] study the recent ML techniques evaluated using the NSL-KDD data set. While, Panwar et al. [34] evaluate the usage of ML on CICIDS-2017 data set. Although there are various manuscript using ML for IDS, comparing the proposed model with recent IDS models is not applicable. This is because the proposed model leverages One-Shot learning, therefore, it cannot be in comparison with classical classification models.

5 One-Shot Evaluation

The evaluation specifies how accurately the proposed network can classify both classes used in training and new attack classes without the need for retraining. The model leverages similarity-based learning. The new attack class is represented using one sample to mimic the labelling process of new attacks.

For each data set evaluation, multiple experiments are conducted. Specifically, () experiments are evaluated, where is the number of classes and is the number of attack classes in order to evaluate the performance of the Siamese Network when using a different set of attack classes for training and evaluation. In each experiment, a separate attack class () is excluded, one at a time. The CM is presented alongside the overall model accuracy for each experiment.

The results of the evaluation of the performance impact of the number of labelled samples () of the new attack class are presented in terms of overall accuracy, new attack True Positive Rate (TPR) and False Negative Rates (FNR), Normal True Negative Rate (TNR) and False Positive Rate (FPR), listed using instances for majority voting, where . The CMs use .

First, the CMs of the CICIDS2017 One-Shot, excluding SSH class is presented in Table V and excluding FTP in Table VII. The overall accuracy is 81.28% and 82.5% respectively. The results demonstrate the network capability to adapt to the emergence of a new cyber-attack after training. It is important to note that the new attack class performance is 73.03% and 70.03% for SSH and FTP respectively. Moreover, the added class demonstrates low FNRs, specifically 8% and 15% for FTP and SSH respectively. On inspection of Table VI and Table VIII, it is evident that using five labelled instances of the new attack class results in an increase in both the overall accuracy and the TPR together with a drop in the FNR. Using only 1 labelled instance demonstrates a comparably poorer performance owing to the instance selection randomness, which could result in either a good or a bad class representative. However, using 5 random labelled instances boosts performance, reinforcing the importance of having distinctive class representatives.

The remainder of the CICIDS2017 performance evaluation results are characterised by similar behaviour and are listed as follows. DoS (Hulk) results are presented in Table IX and Table X, while DoS (Slowloris) in Table XI and Table XII.

Predicted Class
Correct Normal DoS (Hulk) DoS (Slowloris) FTP SSH Overall
Normal 4711
(78.52%)
9
(0.15%)
103
(1.72%)
148
(2.47%)
1029
(17.15%)
81.28%
DoS (Hulk) 93
(1.55%)
5745
(95.75%)
33
(0.55%)
43
(0.72%)
86
(1.43%)
DoS (Slowloris) 507
(8.45%)
0
(0%)
4668
(77.8%)
143
(2.38%)
682
(11.37%)
FTP 643
(10.72%)
1
(0.02%)
127
(2.12%)
4879
(81.32%)
350
(5.83%)
SSH 924
(15.4%)
34
(0.57%)
310
(5.17%)
350
(5.83%)
4382
(73.03%)
TABLE V: CICIDS2017 One-Shot Confusion Matrix (SSH not in Training)
No Votes Overall New Class (SSH) Normal
() Accuracy TPR FNR TNR FPR
1 72.72% 64.10% 16.43% 63.35% 36.65%
5 81.28% 73.03% 15.40% 78.52% 21.48%
10 82.56% 77.82% 13.40% 79.95% 20.05%
15 82.58% 78.43% 13.03% 79.92% 20.08%
20 82.49% 78.33% 13.18% 79.97% 20.03%
25 82.43% 78.30% 13.25% 79.78% 20.22%
30 82.49% 78.45% 13.13% 79.97% 20.03%
TABLE VI: CICIDS2017 One-Shot Accuracy (SSH not in Training) Using Different Votes
Predicted Class
Correct Normal DoS (Hulk) DoS (Slowloris) FTP SSH Overall
Normal 5231
(87.18%)
3
(0.05%)
152
(2.53%)
189
(3.15%)
425
(7.08%)
82.5%
DoS (Hulk) 70
(1.17%)
5755
(95.92%)
48
(0.8%)
15
(0.25%)
112
(1.87%)
DoS (Slowloris) 424
(7.07%)
1
(0.02%)
4433
(73.88%)
485
(8.08%)
657
(10.95%)
FTP 518
(8.63%)
1
(0.02%)
659
(10.98%)
4202
(70.03%)
620
(10.33%)
SSH 546
(9.1%)
3
(0.05%)
198
(3.3%)
124
(2.07%)
5129
(85.48%)
TABLE VII: CICIDS2017 One-Shot Confusion Matrix (FTP Not in Training)
No Votes Overall New Class (FTP) Normal
() Accuracy TPR FNR TNR FPR
1 72.91% 59.65% 8.03% 72.83% 27.17%
5 82.5% 70.03% 8.63% 87.18% 12.82%
10 84.57% 72.8% 8.32% 87.70% 12.30%
15 85.47% 76.72% 8.12% 87.40% 12.60%
20 85.78% 77.58% 8.10% 87.23% 12.77%
25 85.86% 78.27% 8.10% 86.92% 13.08%
30 85.94% 78.48% 8.00% 86.73% 13.27%
TABLE VIII: CICIDS2017 One-Shot Accuracy (FTP not in Training) Using Different Votes
Predicted Class
Correct Normal DoS (Hulk) DoS (Slowloris) FTP SSH Overall
Normal 4314
(71.9%)
1095
(18.25%)
174
(2.9%)
113
(1.88%)
304
(5.07%)
80.81%
DoS (Hulk) 78
(1.3%)
5708
(95.13%)
60
(1%)
58
(0.97%)
96
(1.6%)
DoS (Slowloris) 451
(7.52%)
51
(0.85%)
4767
(79.45%)
111
(1.85%)
620
(10.33%)
FTP 624
(10.4%)
171
(2.85%)
138
(2.3%)
4521
(75.35%)
546
(9.1%)
SSH 597
(9.95%)
26
(0.43%)
245
(4.08%)
198
(3.3%)
4934
(82.23%)
TABLE IX: CICIDS2017 One-Shot Confusion Matrix (DoS(Hulk) Not in Training)
No Votes Overall New Class (Hulk) Normal
() Accuracy TPR FNR TNR FPR
1 72.28% 91.07% 4.90% 58.05% 41.95%
5 80.81% 95.13% 1.30% 71.90% 28.10%
10 82.59% 95.22% 1.22% 75.58% 24.42%
15 82.54% 95.23% 1.20% 74.67% 25.33%
20 82.86% 95.2% 1.20% 76.02% 23.98%
25 82.76% 95.2% 1.15% 75.50% 24.50%
30 82.93% 95.18% 1.22% 76.15% 23.85%
TABLE X: CICIDS2017 One-Shot Accuracy (DoS (Hulk) not in Training) Using Different Votes
Predicted Class
Correct Normal DoS (Hulk) DoS (Slowloris) FTP SSH Overall
Normal 5307
(88.45%)
6
(0.1%)
459
(7.65%)
64
(1.07%)
164
(2.73%)
81.07%
DoS (Hulk) 37
(0.62%)
5794
(96.57%)
65
(1.08%)
53
(0.88%)
51
(0.85%)
DoS (Slowloris) 574
(9.57%)
26
(0.43%)
4024
(67.07%)
582
(9.7%)
794
(13.23%)
FTP 482
(8.03%)
1
(0.02%)
598
(9.97%)
4639
(77.32%)
280
(4.67%)
SSH 446
(7.43%)
0
(0%)
817
(13.62%)
181
(3.02%)
4556
(75.93%)
TABLE XI: CICIDS2017 One-Shot Confusion Matrix (Dos(Slowloris) Not in Training)
No Votes Overall New Class (Slowloris) Normal
() Accuracy TPR FNR TNR FPR
1 72.28% 50.97% 11.50% 72.65% 27.35%
5 80.81% 67.07% 9.57% 88.45% 11.55%
10 82.59% 71.38% 7.38% 89.48% 10.52%
15 82.54% 72.2% 7.18% 89.37% 10.63%
20 82.86% 72.77% 6.85% 89.67% 10.33%
25 82.76% 72.93% 6.58% 89.65% 10.35%
30 82.93% 72.82% 6.68% 89.70% 10.30%
TABLE XII: CICIDS2017 One-Shot Accuracy (DoS (Slowloris) not in Training) Using Different Votes

The CMs of the KDD Cup’99 and NSL-KDD data sets One-Shot, excluding the DoS attack from training are presented in Table XIII and Table XV, respectively; the overall accuracies are 76.67% and 77.99%. It is important to note however, that the False Negative rates for the new class (i.e. DoS) are 26.38% for the KDD Cup’99 and 9.87% for the NSL-KDD. Additional to the observations arising from the CICIDS2017 evaluation, these results highlight two further elements; (a) the Siamese Network did not find a high similarity between the new attack and the normal instances; (b) the new attack class TPR in the NSL-KDD results is significantly higher than KDD Cup’99 (78.87% compared to 40.28%), because the NSL-KDD is an enhanced version of the KDD Cup’99 (filtered and duplicate instances removed). Knowing that the new class is not used in the training phase and the similarity is only calculated from a few instances, a better representation of instances improves performance (i.e. NSL-KDD instances). Results confirm that new labelled instances need to be appropriate representatives.

In consideration of completeness, the remaining NSL-KDD and the KDD Cup’99 results - which demonstrate similar performance - are listed as follows; excluding Probe results are listed in Table XVII, Table XVIII, Table XIX and Table XX; Table XXV, Table XXVI, Table XXVII and Table XXVIII present the results when excluding R2L; Finally, excluding U2R are in Table XXI, Table XXII, Table XXIII and Table XXIV.

Predicted Class
Correct Normal DoS Probe R2L U2R Overall
Normal 4562
(76.03%)
243
(4.05%)
522
(8.7%)
579
(9.65%)
94
(1.57%)
76.67%
DoS 1583
(26.38%)
2417
(40.28%)
1831
(30.52%)
168
(2.8%)
1
(0.02%)
Probe 159
(2.65%)
214
(3.57%)
5367
(89.45%)
242
(4.03%)
18
(0.3%)
R2L 56
(0.93%)
275
(4.58%)
10
(0.17%)
5571
(92.85%)
88
(1.47%)
U2R 17
(0.28%)
205
(3.42%)
655
(10.92%)
40
(0.67%)
5083
(84.72%)
TABLE XIII: KDD One-Shot Confusion Matrix (DoS Not in Training)
No Votes Overall New Class (DoS) Normal
() Accuracy TPR FNR TNR FPR
1 66.89% 41.67% 22.50% 66.35% 33.65%
5 76.67% 40.28% 26.38% 76.03% 23.97%
10 77.57% 40.07% 27.25% 76.10% 23.90%
15 77.67% 39.9% 27.32% 76.02% 23.98%
20 77.68% 39.93% 27.38% 76.02% 23.98%
25 77.68% 39.87% 27.40% 76.07% 23.93%
30 77.68% 39.88% 27.40% 76.03% 23.97%
TABLE XIV: KDD One-Shot Accuracy (DoS not in Training) Using Different Votes
Predicted Class
Correct Normal DoS Probe R2L U2R Overall
Normal 5593
(93.22%)
61
(1.02%)
136
(2.27%)
122
(2.03%)
88
(1.47%)
77.99%
DoS 592
(9.87%)
4732
(78.87%)
653
(10.88%)
12
(0.2%)
11
(0.18%)
Probe 67
(1.12%)
3305
(55.08%)
2595
(43.25%)
19
(0.32%)
14
(0.23%)
R2L 212
(3.53%)
7
(0.12%)
27
(0.45%)
5692
(94.87%)
62
(1.03%)
U2R 486
(8.1%)
6
(0.1%)
31
(0.52%)
693
(11.55%)
4784
(79.73%)
TABLE XV: NSL-KDD One-Shot Confusion Matrix (DoS Not in Training)
No Votes Overall New Class (DoS) Normal
() Accuracy TPR FNR TNR FPR
1 72.75% 67.35% 9.05% 84.87% 15.13%
5 77.99% 78.87% 9.87% 93.22% 6.78%
10 77.7% 84.62% 9.87% 93.35% 6.65%
15 79.05% 83.78% 9.87% 93.32% 6.68%
20 78.63% 85.25% 9.87% 93.37% 6.63%
25 79.49% 84.62% 9.87% 93.35% 6.65%
30 79.12% 85.37% 9.87% 93.35% 6.65%
TABLE XVI: NSL-KDD One-Shot Accuracy (DoS not in Training) Using Different Votes
Predicted Class
Correct Normal DoS Probe R2L U2R Overall
Normal 5389
(89.82%)
89
(1.48%)
195
(3.25%)
245
(4.08%)
82
(1.37%)
75.31%
DoS 37
(0.62%)
5842
(97.37%)
95
(1.58%)
21
(0.35%)
5
(0.08%)
Probe 1697
(28.28%)
2571
(42.85%)
565
(9.42%)
948
(15.8%)
219
(3.65%)
R2L 54
(0.9%)
0
(0%)
55
(0.92%)
5800
(96.67%)
91
(1.52%)
U2R 263
(4.38%)
0
(0%)
21
(0.35%)
720
(12%)
4996
(83.27%)
TABLE XVII: NSL-KDD One-Shot Confusion Matrix (Probe Not in Training)
No Votes Overall New Class (Probe) Normal
() Accuracy TPR FNR TNR FPR
1 70.62% 18.80% 24.78% 77.53% 22.47%
5 75.31% 9.42% 28.28% 89.82% 10.18%
10 75.2% 4.83% 28.82% 91.08% 8.92%
15 75.12% 4.05% 29.08% 91.18% 8.82%
20 75.11% 3.47% 29.20% 91.45% 8.55%
25 75% 3.02% 29.55% 91.35% 8.65%
30 74.94% 2.68% 29.68% 91.33% 8.67%
TABLE XVIII: NSL-KDD One-Shot Accuracy (Probe not in Training) Using Different Votes
Predicted Class
Correct Normal DoS Probe R2L U2R Overall
Normal 4515
(75.25%)
16
(0.27%)
383
(6.38%)
1016
(16.93%)
70
(1.17%)
72.23%
DoS 18
(0.3%)
5896
(98.27%)
81
(1.35%)
4
(0.07%)
1
(0.02%)
Probe 719
(11.98%)
3707
(61.78%)
612
(10.2%)
941
(15.68%)
21
(0.35%)
R2L 26
(0.43%)
0
(0%)
16
(0.27%)
5946
(99.1%)
12
(0.2%)
U2R 55
(0.92%)
37
(0.62%)
264
(4.4%)
943
(15.72%)
4701
(78.35%)
TABLE XIX: KDD One-Shot Confusion Matrix (Probe Not in Training)
No Votes Overall New Class (Probe) Normal
() Accuracy TPR FNR TNR FPR
1 66.72% 15.72% 11.77% 65.72% 34.28%
5 72.23% 10.2% 11.98% 75.25% 24.75%
10 72.59% 5.9% 13.30% 78.65% 21.35%
15 72.35% 4.82% 13.08% 78.57% 21.43%
20 72.26% 3.58% 13.50% 79.20% 20.80%
25 72.17% 3.05% 13.55% 79.23% 20.77%
30 72.07% 2.17% 13.98% 79.62% 20.38%
TABLE XX: KDD One-Shot Accuracy (Probe not in Training) Using Different Votes
Predicted Class
Correct Normal DoS Probe R2L U2R Overall
Normal 5199
(86.65%)
24
(0.4%)
148
(2.47%)
530
(8.83%)
99
(1.65%)
80.16%
DoS 15
(0.25%)
5799
(96.65%)
36
(0.6%)
26
(0.43%)
124
(2.07%)
Probe 90
(1.5%)
242
(4.03%)
5416
(90.27%)
236
(3.93%)
16
(0.27%)
R2L 2526
(42.1%)
1
(0.02%)
142
(2.37%)
2759
(45.98%)
572
(9.53%)
U2R 852
(14.2%)
3
(0.05%)
0
(0%)
270
(4.5%)
4875
(81.25%)
TABLE XXI: NSL- KDD One-Shot Confusion Matrix (R2L Not in Training)
No Votes Overall New Class (R2L) Normal
() Accuracy TPR FNR TNR FPR
1 74.5% 46.05% 38.13% 74.73% 25.27%
5 80.16% 45.98% 42.10% 86.65% 13.35%
10 80.79% 46.82% 41.58% 88.07% 11.93%
15 81.09% 49.02% 39.88% 87.72% 12.28%
20 81% 48.62% 40.38% 87.90% 12.10%
25 80.95% 48.37% 40.63% 87.88% 12.12%
30 80.91% 48.2% 40.93% 87.93% 12.07%
TABLE XXII: NSL-KDD One-Shot Accuracy (R2L not in Training) Using Different Votes
Predicted Class
Correct Normal DoS Probe R2L U2R Overall
Normal 4288
(71.47%)
1
(0.02%)
400
(6.67%)
730
(12.17%)
581
(9.68%)
74.2%
DoS 10
(0.17%)
5909
(98.48%)
72
(1.2%)
9
(0.15%)
0
(0%)
Probe 90
(1.5%)
160
(2.67%)
5338
(88.97%)
165
(2.75%)
247
(4.12%)
R2L 1702
(28.37%)
2
(0.03%)
1344
(22.4%)
2148
(35.8%)
804
(13.4%)
U2R 527
(8.78%)
1
(0.02%)
682
(11.37%)
213
(3.55%)
4577
(76.28%)
TABLE XXIII: KDD One-Shot Confusion Matrix (R2L Not in Training)
No Votes Overall New Class (R2L) Normal
() Accuracy TPR FNR TNR FPR
1 67.75% 38.48% 25.95% 59.65% 40.35%
5 74.2% 35.8% 28.37% 71.47% 28.53%
10 77.27% 42.22% 23.85% 74.38% 25.62%
15 78.34% 46.65% 22.05% 74.50% 25.50%
20 78.94% 49.18% 21.45% 74.62% 25.38%
25 79.44% 51.32% 20.72% 74.65% 25.35%
30 79.87% 53.35% 20.65% 74.55% 25.45%
TABLE XXIV: KDD One-Shot Accuracy (R2L not in Training) Using Different Votes
Predicted Class
Correct Normal DoS Probe R2L U2R Overall
Normal 4530
(75.5%)
127
(2.12%)
76
(1.27%)
237
(3.95%)
1030
(17.17%)
77.04%
DoS 120
(2%)
5771
(96.18%)
49
(0.82%)
16
(0.27%)
44
(0.73%)
Probe 43
(0.72%)
304
(5.07%)
5574
(92.9%)
69
(1.15%)
10
(0.17%)
R2L 403
(6.72%)
1
(0.02%)
27
(0.45%)
5238
(87.3%)
331
(5.52%)
U2R 2191
(36.52%)
0
(0%)
221
(3.68%)
1589
(26.48%)
1999
(33.32%)
TABLE XXV: NSL-KDD One-Shot Confusion Matrix (U2R Not in Training)
No Votes Overall New Class (U2R) Normal
() Accuracy TPR FNR TNR FPR
1 72.42% 34.37% 35.55% 66.58% 33.42%
5 77.04% 33.32% 36.52% 75.50% 24.50%
10 77.08% 30.42% 36.95% 77.85% 22.15%
15 77.19% 30.2% 36.70% 78.22% 21.78%
20 77.12% 29.37% 36.67% 78.52% 21.48%
25 77.14% 28.85% 36.72% 78.87% 21.13%
30 77.12% 28.3% 37.10% 79.25% 20.75%
TABLE XXVI: NSL-KDD One-Shot Accuracy (U2R not in Training) Using Different Votes
Predicted Class
Correct Normal DoS Probe R2L U2R Overall
Normal 4146
(69.1%)
5
(0.08%)
440
(7.33%)
796
(13.27%)
613
(10.22%)
75.72%
DoS 7
(0.12%)
5921
(98.68%)
59
(0.98%)
6
(0.1%)
7
(0.12%)
Probe 53
(0.88%)
384
(6.4%)
5449
(90.82%)
59
(0.98%)
55
(0.92%)
R2L 35
(0.58%)
0
(0%)
13
(0.22%)
5849
(97.48%)
103
(1.72%)
U2R 958
(15.97%)
1
(0.02%)
669
(11.15%)
3022
(50.37%)
1350
(22.5%)
TABLE XXVII: KDD One-Shot Confusion Matrix (U2R Not in Training)
No Votes Overall New Class (U2R) Normal
() Accuracy TPR FNR TNR FPR
1 70.69% 21.40% 17.28% 59.27% 40.73%
5 75.72% 22.5% 15.97% 69.10% 30.90%
10 76.26% 21.82% 17.17% 72.18% 27.82%
15 76.33% 21.83% 17.15% 72.52% 27.48%
20 76.31% 21.48% 17.52% 72.72% 27.28%
25 76.34% 21.45% 17.55% 72.77% 27.23%
30 76.33% 21.27% 17.73% 72.90% 27.10%
TABLE XXVIII: KDD One-Shot Accuracy (U2R not in Training) Using Different Votes

6 Conclusion and Future Work

The paper presents an Intrusion Detection Siamese Network framework capable of classifying new cyber-attacks based on a limited number of labelled instances (One-Shot). The evaluation of the model was performed on three different data sets; CICIDS2017, KDD Cup’99 and the NSL-KDD, an enhancement of the KDD Cup’99.

Results of the evaluation re-confirm that particular consideration must be given on creating the training set, ensuring an equal number of training pairs for every class combination. The core requirement, in turn, presents a challenge of an exploding number of combinations between all instances. Thus, distinct pairs are chosen to create large batches in the region of 30,000 pairs to mitigate the growth. During evaluation, similarity comparison using a single point for each class resulted in noisy predictions due to randomness obviated through the selection of multiple () random instances from each class and aggregation using majority voting.

Results demonstrate the ability of the proposed architecture to classify cyber-attacks based on learning from similarity. Moreover, the results highlighted the need for representative instances for the new attack class. Furthermore, evidence is provided to confirm the ability of One-Shot learning methodologies to adapt to new cyber-attacks without retraining when only a few instances are available for a new attack. An overall accuracy of between 80% - 85% for the CICIDS2017 data set was evaluated, demonstrating acceptable accuracy in detecting previously unseen attacks. The overall accuracy reached above 75% for the KDD Cup’99 and NSL-KDD data sets. Further and also important to the application is that the overall accuracy was achieved at a low FNR for the new attack classes.

References

  • [1] H. Altae-Tran, B. Ramsundar, A. S. Pappu, and V. Pande (2017) Low data drug discovery with One-Shot learning.. ACS Central Science 3 (4), pp. 283–293. Cited by: §2.
  • [2] Y. Benajiba, J. Sun, Y. Zhang, L. Jiang, Z. Weng, and O. Biran (2019) Siamese networks for semantic pattern similarity.. In 2019 IEEE 13th International Conference on Semantic Computing (ICSC), pp. 191–194. Cited by: §2.1.
  • [3] J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah (1994) Signature verification using a “Siamese” time delay neural network.. In Advances in Neural Information Processing Systems, pp. 737–744. Cited by: §2.1.
  • [4] J. Bruce, N. Sünderhauf, P. Mirowski, R. Hadsell, and M. Milford (2017)

    One-Shot reinforcement learning for robot navigation with interactive replay.

    .
    arXiv preprint arXiv:1711.10137. Cited by: §2.
  • [5] A. L. Buczak and E. Guven (2016) A survey of data mining and machine learning methods for cyber security intrusion detection.. IEEE Communications Surveys Tutorials 18 (2), pp. 1153–1176. External Links: Document, ISSN 1553-877X Cited by: §1.
  • [6] R. Chapaneri and S. Shah (2019) A comprehensive survey of machine learning-based network intrusion detection. In Smart Intelligent Computing and Applications, S. C. Satapathy, V. Bhateja, and S. Das (Eds.), Singapore, pp. 345–356. External Links: ISBN 978-981-13-1921-1 Cited by: §4.3.
  • [7] S. Chopra, R. Hadsell, and Y. LeCun (2005) Learning a similarity metric discriminatively, with application to face verification.. In CVPR (1), pp. 539–546. Cited by: §2.1.
  • [8] D. Chung, K. Tahboub, and E. J. Delp (2017)

    A two stream Siamese convolutional neural network for person re-identification.

    .
    In

    Proceedings of the IEEE International Conference on Computer Vision

    ,
    pp. 1983–1991. Cited by: §2.
  • [9] Y. Chung and W. Weng (2017)

    Learning deep representations of medical images using Siamese CNNs with application to content-based image retrieval.

    .
    arXiv preprint arXiv:1711.08490. Cited by: §2.
  • [10] A. Das, H. Yenala, M. Chinnakotla, and M. Shrivastava (2016) Together we stand: Siamese networks for similar question retrieval.. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1, pp. 378–387. Cited by: §2.1.
  • [11] Y. Duan, M. Andrychowicz, B. Stadie, O. J. Ho, J. Schneider, I. Sutskever, P. Abbeel, and W. Zaremba (2017) One-shot imitation learning.. In Advances in Neural Information Processing Systems, pp. 1087–1098. Cited by: §2.
  • [12] L. Fei-Fei, R. Fergus, and P. Perona (2006) One-Shot learning of object categories.. IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (4), pp. 594–611. External Links: ISBN 0162-8828, Document Cited by: §2.
  • [13] C. I. for Cybersecurity NSL-KDD dataset.. External Links: Link Cited by: §4.3.
  • [14] C. Galea and R. A. Farrugia (2018-06) Matching software-generated sketches to face photographs with a very deep CNN, morphed faces, and transfer learning.. IEEE Transactions on Information Forensics and Security 13 (6), pp. 1421–1431. External Links: Document, ISSN 1556-6013 Cited by: §2.
  • [15] S. Gao, Y. Zhang, K. Jia, J. Lu, and Y. Zhang (2015-10)

    Single sample face recognition via learning deep supervised autoencoders.

    .
    IEEE Transactions on Information Forensics and Security 10 (10), pp. 2108–2118. External Links: Document, ISSN 1556-6013 Cited by: §2.
  • [16] V. Garcia and J. Bruna (2017) Few-shot learning with graph neural networks.. arXiv preprint arXiv:1711.04043. Cited by: §2.
  • [17] R. Hadsell, S. Chopra, and Y. LeCun (2006) Dimensionality reduction by learning an invariant mapping.. In

    2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)

    ,
    Vol. 2, pp. 1735–1742. Cited by: §2.1, §2.1.
  • [18] S. Hettich and S. D. Bay (1999) The UCI KDD archive.. Note: (Accessed on 06/15/2018) External Links: Link Cited by: §4.2.
  • [19] H. Hindy, D. Brosset, E. Bayne, A. Seeam, C. Tachtatzis, R. Atkinson, and X. Bellekens (2018) A taxonomy and survey of intrusion detection system design techniques, network threats and datasets. CoRR abs/1806.03517. External Links: Link, 1806.03517 Cited by: §1, §4.2.
  • [20] S. Jain (2017-01) NanoNets: how to use deep learning when you have limited data.. External Links: Link Cited by: §2.
  • [21] () KDD cup 1999 data. Note: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html(Accessed on 12/07/2018) Cited by: §4.2.
  • [22] G. Koch, R. Zemel, and R. Salakhutdinov (2015) Siamese neural networks for One-Shot image recognition.. In ICML Deep Learning Workshop, Vol. 2. Cited by: §2.1, §2.1, §3.
  • [23] B. Li, J. Springer, G. Bebis, and M. H. Gunes (2013) A survey of network flow applications.. Journal of Network and Computer Applications 36 (2), pp. 567–581. Cited by: §1.
  • [24] J. Ngiam, D. Peng, V. Vasudevan, S. Kornblith, Q. V. Le, and R. Pang (2018) Domain adaptive transfer learning with specialist models.. arXiv preprint arXiv:1811.07056. Cited by: §2.
  • [25] L. D. Nguyen, D. Lin, Z. Lin, and J. Cao (2018) Deep CNNs for microscopic image classification by exploiting transfer learning and feature concatenation.. In 2018 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. Cited by: §2.
  • [26] S. J. Pan, Q. Yang, et al. (2010) A survey on transfer learning.. IEEE Transactions on Knowledge and Data Engineering 22 (10), pp. 1345–1359. Cited by: §2.
  • [27] S. Pang, S. Qiao, T. Song, J. Zhao, and P. Zheng (2019) An improved convolutional network architecture based on residual modeling for person re-identification in edge computing. IEEE Access 7 (), pp. 106749–106760. Cited by: §3.
  • [28] A. Patcha and J. Park (2007) An overview of anomaly detection techniques: existing solutions and latest technological trends.. Computer Networks 51 (12), pp. 3448–3470. Cited by: §1.
  • [29] Y. Roh, G. Heo, and S. E. Whang (2018) A survey on data collection for machine learning: a Big Data-AI integration perspective.. arXiv preprint arXiv:1811.03402. Cited by: §2.
  • [30] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. (2015) Imagenet large scale visual recognition challenge.. International Journal of Computer Vision 115 (3), pp. 211–252. Cited by: §2.
  • [31] U. Shaham and R. R. Lederman (2018) Learning by coincidence: Siamese networks and common variable learning.. Pattern Recognition 74, pp. 52 – 63. External Links: ISSN 0031-3203, Document, Link Cited by: §2.1.
  • [32] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization.. In ICISSP, pp. 108–116. Cited by: §4.1.
  • [33] E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno-Noguer (2015-12) Discriminative learning of deep convolutional feature point descriptors. In The IEEE International Conference on Computer Vision (ICCV), Cited by: §3.
  • [34] S. Singh Panwar, Y. Raiwani, and L. S. Panwar (2019)

    Evaluation of network intrusion detection with features selection and machine learning algorithms on cicids-2017 dataset

    .
    Available at SSRN 3394103. Cited by: §4.3.
  • [35] J. Snell, K. Swersky, and R. Zemel (2017) Prototypical networks for few-shot learning.. In Advances in Neural Information Processing Systems, pp. 4077–4087. Cited by: §2.
  • [36] Q. Sun, Y. Liu, T. Chua, and B. Schiele (2019) Meta-transfer learning for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 403–412. Cited by: §1.
  • [37] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani (2009) A detailed analysis of the KDD CUP 99 data set.. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6. Cited by: §4.3.
  • [38] R. Thomas and D. Pavithran (2018) A survey of intrusion detection models based on nsl-kdd data set. In 2018 Fifth HCT Information Technology Trends (ITT), Vol. , pp. 286–291. Cited by: §4.3.
  • [39] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, and J. Ortega-Garcia (2018)

    Exploring recurrent neural networks for on-line handwritten signature biometrics.

    .
    IEEE Access 6, pp. 5128–5138. Cited by: §2.
  • [40] R. R. Varior, M. Haloi, and G. Wang (2016) Gated Siamese convolutional neural network architecture for human re-identification.. In European Conference on Computer Vision, pp. 791–808. Cited by: §2.1.
  • [41] O. Vinyals, C. Blundell, T. Lillicrap, and D. Wierstra (2016) Matching networks for One Shot learning.. In Advances in Neural Information Processing Systems, pp. 3630–3638. Cited by: §1, §2.
  • [42] L. Wang, Y. Li, and S. Wang (2018) Feature learning for One-Shot face recognition.. In 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 2386–2390. Cited by: §2.
  • [43] Q. Wang, X. Zhao, J. Huang, Y. Feng, Z. Liu, J. Su, Z. Luo, and G. Cheng (2017) Addressing complexities of machine learning in big data: principles, trends and challenges from systematical perspectives.. External Links: Document Cited by: §2.
  • [44] K. Weiss, T. M. Khoshgoftaar, and D. Wang (2016-05-28) A survey of transfer learning.. Journal of Big Data 3 (1), pp. 1–40. External Links: ISSN 2196-1115, Document, Link Cited by: §2.
  • [45] D. Wu, F. Zhu, and L. Shao (2012) One Shot learning gesture recognition from RGBD images.. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 7–12. External Links: ISBN 2160-7508, Document Cited by: §2.
  • [46] Y. Yang, I. Saleemi, and M. Shah (2013) Discovering motion primitives for unsupervised grouping and One-Shot learning of human actions, gestures, and expressions.. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (7), pp. 1635–1648. External Links: ISBN 0162-8828, Document Cited by: §2.
  • [47] Y. Yao, X. Wu, W. Zuo, and D. Zhang (2018) Learning Siamese network with top-down modulation for visual tracking.. In International Conference on Intelligent Science and Big Data Engineering, pp. 378–388. Cited by: §2.1.
  • [48] W. Yin, H. Schütze, B. Xiang, and B. Zhou (2016) ABCNN: attention-based convolutional neural network for modeling sentence pairs.. Transactions of the Association of Computational Linguistics 4 (1), pp. 259–272. Cited by: §2.
  • [49] Z. Zhang and H. Zhao (2018) One-shot learning for question-answering in Gaokao history challenge.. In Proceedings of the 27th International Conference on Computational Linguistics, pp. 449–461. Cited by: §2.
  • [50] W. Zhu, T. Yao, J. Ni, B. Wei, and Z. Lu (2018)

    Dependency-based Siamese long short-term memory network for learning sentence representations.

    .
    PloS One 13 (3), pp. e0193919. Cited by: §2.1.