Multi-features based Semantic Augmentation Networks for Named Entity Recognition in Threat Intelligence

07/01/2022
by   Peipei Liu, et al.
0

Extracting cybersecurity entities such as attackers and vulnerabilities from unstructured network texts is an important part of security analysis. However, the sparsity of intelligence data resulted from the higher frequency variations and the randomness of cybersecurity entity names makes it difficult for current methods to perform well in extracting security-related concepts and entities. To this end, we propose a semantic augmentation method which incorporates different linguistic features to enrich the representation of input tokens to detect and classify the cybersecurity names over unstructured text. In particular, we encode and aggregate the constituent feature, morphological feature and part of speech feature for each input token to improve the robustness of the method. More than that, a token gets augmented semantic information from its most similar K words in cybersecurity domain corpus where an attentive module is leveraged to weigh differences of the words, and from contextual clues based on a large-scale general field corpus. We have conducted experiments on the cybersecurity datasets DNRTI and MalwareTextDB, and the results demonstrate the effectiveness of the proposed method.

READ FULL TEXT
research
10/29/2020

Named Entity Recognition for Social Media Texts with Semantic Augmentation

Existing approaches for named entity recognition suffer from data sparsi...
research
03/21/2022

An Intellectual Property Entity Recognition Method Based on Transformer and Technological Word Information

Patent texts contain a large amount of entity information. Through named...
research
05/05/2023

A transformer-based method for zero and few-shot biomedical named entity recognition

Supervised named entity recognition (NER) in the biomedical domain is de...
research
04/08/2022

CyNER: A Python Library for Cybersecurity Named Entity Recognition

Open Cyber threat intelligence (OpenCTI) information is available in an ...
research
12/05/2019

Design and implementation of an open source Greek POS Tagger and Entity Recognizer using spaCy

This paper proposes a machine learning approach to part-of-speech taggin...
research
07/27/2018

Clustering Prominent People and Organizations in Topic-Specific Text Corpora

Named entities in text documents are the names of people, organization, ...
research
10/21/2022

Named Entity Detection and Injection for Direct Speech Translation

In a sentence, certain words are critical for its semantic. Among them, ...

Please sign up or login with your details

Forgot password? Click here to reset