STIXnet: A Novel and Modular Solution for Extracting All STIX Objects in CTI Reports

03/17/2023
by   Francesco Marchiori, et al.
0

The automatic extraction of information from Cyber Threat Intelligence (CTI) reports is crucial in risk management. The increased frequency of the publications of these reports has led researchers to develop new systems for automatically recovering different types of entities and relations from textual data. Most state-of-the-art models leverage Natural Language Processing (NLP) techniques, which perform greatly in extracting a few types of entities at a time but cannot detect heterogeneous data or their relations. Furthermore, several paradigms, such as STIX, have become de facto standards in the CTI community and dictate a formal categorization of different entities and relations to enable organizations to share data consistently. This paper presents STIXnet, the first solution for the automated extraction of all STIX entities and relationships in CTI reports. Through the use of NLP techniques and an interactive Knowledge Base (KB) of entities, our approach obtains F1 scores comparable to state-of-the-art models for entity extraction (0.916) and relation extraction (0.724) while considering significantly more types of entities and relations. Moreover, STIXnet constitutes a modular and extensible framework that manages and coordinates different modules to merge their contributions uniquely and exhaustively. With our approach, researchers and organizations can extend their Information Extraction (IE) capabilities by integrating the efforts of several techniques without needing to develop new tools from scratch.

READ FULL TEXT
research
11/08/2021

JaMIE: A Pipeline Japanese Medical Information Extraction System

We present an open-access natural language processing toolkit for Japane...
research
06/28/2021

RadGraph: Extracting Clinical Entities and Relations from Radiology Reports

Extracting structured clinical information from free-text radiology repo...
research
05/12/2021

Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts

The relevance of the Key Information Extraction (KIE) task is increasing...
research
08/09/2023

RadGraph2: Modeling Disease Progression in Radiology Reports via Hierarchical Information Extraction

We present RadGraph2, a novel dataset for extracting information from ra...
research
12/14/2017

Relation Extraction : A Survey

With the advent of the Internet, large amount of digital text is generat...
research
12/15/2021

GenIE: Generative Information Extraction

Structured and grounded representation of text is typically formalized b...

Please sign up or login with your details

Forgot password? Click here to reset