ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification

02/13/2022
by   Xinjie Lin, et al.
4

Encrypted traffic classification requires discriminative and robust traffic representation captured from content-invisible and imbalanced traffic data for accurate classification, which is challenging but indispensable to achieve network security and network management. The major limitation of existing solutions is that they highly rely on the deep features, which are overly dependent on data size and hard to generalize on unseen data. How to leverage the open-domain unlabeled traffic data to learn representation with strong generalization ability remains a key challenge. In this paper,we propose a new traffic representation model called Encrypted Traffic Bidirectional Encoder Representations from Transformer (ET-BERT), which pre-trains deep contextualized datagram-level representation from large-scale unlabeled data. The pre-trained model can be fine-tuned on a small number of task-specific labeled data and achieves state-of-the-art performance across five encrypted traffic classification tasks, remarkably pushing the F1 of ISCX-Tor to 99.2 (4.4 improvement), Cross-Platform (Android) to 92.5 CSTNET-TLS 1.3 to 97.4 explanation of the empirically powerful pre-training model by analyzing the randomness of ciphers. It gives us insights in understanding the boundary of classification ability over encrypted traffic. The code is available at: https://github.com/linwhitehat/ET-BERT.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 7

page 8

page 9

research
09/13/2021

Effectiveness of Pre-training for Few-shot Intent Classification

This paper investigates the effectiveness of pre-training for few-shot i...
research
05/02/2023

BrainNPT: Pre-training of Transformer networks for brain network classification

Deep learning methods have advanced quickly in brain imaging analysis ov...
research
09/24/2019

Understanding Semantics from Speech Through Pre-training

End-to-end Spoken Language Understanding (SLU) is proposed to infer the ...
research
06/02/2019

Pre-training of Graph Augmented Transformers for Medication Recommendation

Medication recommendation is an important healthcare application. It is ...
research
12/14/2020

Differentiation of Sliding Rescaled Ranges: New Approach to Encrypted and VPN Traffic Detection

We propose a new approach to traffic preprocessing called Differentiatio...
research
05/16/2019

Latent Universal Task-Specific BERT

This paper describes a language representation model which combines the ...
research
08/27/2021

MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving

Autonomous driving has attracted much attention over the years but turns...

Please sign up or login with your details

Forgot password? Click here to reset