Constructing Flow Graphs from Procedural Cybersecurity Texts

05/29/2021
by   Kuntal Kumar Pal, et al.
16

Following procedural texts written in natural languages is challenging. We must read the whole text to identify the relevant information or identify the instruction flows to complete a task, which is prone to failures. If such texts are structured, we can readily visualize instruction-flows, reason or infer a particular step, or even build automated systems to help novice agents achieve a goal. However, this structure recovery task is a challenge because of such texts' diverse nature. This paper proposes to identify relevant information from such texts and generate information flows between sentences. We built a large annotated procedural text dataset (CTFW) in the cybersecurity domain (3154 documents). This dataset contains valuable instructions regarding software vulnerability analysis experiences. We performed extensive experiments on CTFW with our LM-GNN model variants in multiple settings. To show the generalizability of both this task and our method, we also experimented with procedural texts from two other domains (Maintenance Manual and Cooking), which are substantially different from cybersecurity. Our experiments show that Graph Convolution Network with BERT sentence embeddings outperforms BERT in all three domains

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2023

TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design

High-quality instruction-tuning data is critical to improving LLM capabi...
research
05/11/2021

Integrating extracted information from bert and multiple embedding methods with the deep neural network for humour detection

Humour detection from sentences has been an interesting and challenging ...
research
03/27/2023

Unified Text Structuralization with Instruction-tuned Language Models

Text structuralization is one of the important fields of natural languag...
research
05/21/2023

Multilingual Simplification of Medical Texts

Automated text simplification aims to produce simple versions of complex...
research
10/09/2021

Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations

NLP models that compare or consolidate information across multiple docum...
research
08/03/2022

Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language

We present a new pre-trained language model (PLM) for Rabbinic Hebrew, t...
research
11/10/2022

BERT in Plutarch's Shadows

The extensive surviving corpus of the ancient scholar Plutarch of Chaero...

Please sign up or login with your details

Forgot password? Click here to reset