Assessment of the Relative Importance of different hyper-parameters of LSTM for an IDS

12/26/2020
by   Mohit Sewak, et al.
9

Recurrent deep learning language models like the LSTM are often used to provide advanced cyber-defense for high-value assets. The underlying assumption for using LSTM networks for malware-detection is that the op-code sequence of malware could be treated as a (spoken) language representation. There are differences between any spoken-language (sequence of words/sentences) and the machine-language (sequence of op-codes). In this paper, we demonstrate that due to these inherent differences, an LSTM model with its default configuration as tuned for a spoken-language, may not work well to detect malware (using its op-code sequence) unless the network's essential hyper-parameters are tuned appropriately. In the process, we also determine the relative importance of all the different hyper-parameters of an LSTM network as applied to malware detection using their op-code sequence representations. We experimented with different configurations of LSTM networks, and altered hyper-parameters like the embedding-size, number of hidden layers, number of LSTM-units in a hidden layer, pruning/padding-length of the input-vector, activation-function, and batch-size. We discovered that owing to the enhanced complexity of the malware/machine-language, the performance of an LSTM network configured for an Intrusion Detection System, is very sensitive towards the number-of-hidden-layers, input sequence-length, and the choice of the activation-function. Also, for (spoken) language-modeling, the recurrent architectures by-far outperform their non-recurrent counterparts. Therefore, we also assess how sequential DL architectures like the LSTM compare against their non-sequential counterparts like the MLP-DNN for the purpose of malware-detection.

READ FULL TEXT

page 1

page 4

page 5

research
09/23/2021

LSTM Hyper-Parameter Selection for Malware Detection: Interaction Effects and Hierarchical Selection Approach

Long-Short-Term-Memory (LSTM) networks have shown great promise in artif...
research
12/08/2017

Characterizing the hyper-parameter space of LSTM language models for mixed context applications

Applying state of the art deep learning models to novel real world datas...
research
07/05/2021

A comparison of LSTM and GRU networks for learning symbolic sequences

We explore relations between the hyper-parameters of a recurrent neural ...
research
06/14/2021

English to Bangla Machine Translation Using Recurrent Neural Network

The applications of recurrent neural networks in machine translation are...
research
07/21/2017

Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks

Selecting optimal parameters for a neural network architecture can often...
research
09/21/2019

Dynamic data fusion using multi-input models for malware classification

Criminals use malware to disrupt cyber-systems. The number of these malw...
research
07/14/2020

Malware Detection for Forensic Memory Using Deep Recurrent Neural Networks

Memory forensics is a young but fast-growing area of research and a prom...

Please sign up or login with your details

Forgot password? Click here to reset