SPEC5G: A Dataset for 5G Cellular Network Protocol Analysis

01/22/2023
by   Imtiaz Karim, et al.
0

5G is the 5th generation cellular network protocol. It is the state-of-the-art global wireless standard that enables an advanced kind of network designed to connect virtually everyone and everything with increased speed and reduced latency. Therefore, its development, analysis, and security are critical. However, all approaches to the 5G protocol development and security analysis, e.g., property extraction, protocol summarization, and semantic analysis of the protocol specifications and implementations are completely manual. To reduce such manual effort, in this paper, we curate SPEC5G the first-ever public 5G dataset for NLP research. The dataset contains 3,547,586 sentences with 134M words, from 13094 cellular network specifications and 13 online websites. By leveraging large-scale pre-trained language models that have achieved state-of-the-art results on NLP tasks, we use this dataset for security-related text classification and summarization. Security-related text classification can be used to extract relevant security-related properties for protocol testing. On the other hand, summarization can help developers and practitioners understand the high level of the protocol, which is itself a daunting task. Our results show the value of our 5G-centric dataset in 5G protocol analysis automation. We believe that SPEC5G will enable a new research direction into automatic analyses for the 5G cellular network protocol and numerous related downstream tasks. Our data and code are publicly available.

READ FULL TEXT

page 7

page 8

research
10/12/2021

LaoPLM: Pre-trained Language Models for Lao

Trained on the large corpus, pre-trained language models (PLMs) can capt...
research
06/08/2023

Privacy- and Utility-Preserving NLP with Anonymized Data: A case study of Pseudonymization

This work investigates the effectiveness of different pseudonymization t...
research
04/17/2019

The current state of affairs in 5G security and the main remaining security challenges

The first release of the 5G protocol specifications, 3rd Generation Part...
research
11/04/2020

Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages

Language models based on the Transformer architecture have achieved stat...
research
09/18/2018

Security and Protocol Exploit Analysis of the 5G Specifications

The Third Generation Partnership Project (3GPP) released its first 5G se...
research
03/30/2020

5G Security and Privacy: A Research Roadmap

Cellular networks represent a critical infrastructure and their security...
research
05/23/2023

Debiasing should be Good and Bad: Measuring the Consistency of Debiasing Techniques in Language Models

Debiasing methods that seek to mitigate the tendency of Language Models ...

Please sign up or login with your details

Forgot password? Click here to reset