REST: A thread embedding approach for identifying and classifying user-specified information in security forums

01/08/2020
by   Joobin Gharibshah, et al.
0

How can we extract useful information from a security forum? We focus on identifying threads of interest to a security professional: (a) alerts of worrisome events, such as attacks, (b) offering of malicious services and products, (c) hacking information to perform malicious acts, and (d) useful security-related experiences. The analysis of security forums is in its infancy despite several promising recent works. Novel approaches are needed to address the challenges in this domain: (a) the difficulty in specifying the "topics" of interest efficiently, and (b) the unstructured and informal nature of the text. We propose, REST, a systematic methodology to: (a) identify threads of interest based on a, possibly incomplete, bag of words, and (b) classify them into one of the four classes above. The key novelty of the work is a multi-step weighted embedding approach: we project words, threads and classes in appropriate embedding spaces and establish relevance and similarity there. We evaluate our method with real data from three security forums with a total of 164k posts and 21K threads. First, REST robustness to initial keyword selection can extend the user-provided keyword set and thus, it can recover from missing keywords. Second, REST categorizes the threads into the classes of interest with superior accuracy compared to five other methods: REST exhibits an accuracy between 63.3-76.9 information of online forums in a user-friendly way, since the user can loosely specify her keywords of interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2018

RIPEx: Extracting malicious IP addresses from security forums using cross-forum learning

Is it possible to extract malicious IP addresses reported in security fo...
research
04/03/2021

Few-Shot Keyword Spotting in Any Language

We introduce a few-shot transfer learning method for keyword spotting in...
research
06/04/2021

Teaching keyword spotters to spot new keywords with limited examples

Learning to recognize new keywords with just a few examples is essential...
research
05/01/2019

Semi-automatic System for Title Construction

In this paper, we propose a semi-automatic system for title construction...
research
09/23/2022

Best Prompts for Text-to-Image Models and How to Find Them

Recent progress in generative models, especially in text-guided diffusio...
research
01/31/2021

Extending Neural Keyword Extraction with TF-IDF tagset matching

Keyword extraction is the task of identifying words (or multi-word expre...
research
08/01/2022

Eficiency of REST and gRPC realizing communication tasks in microservice-based ecosystems

The aim of this contribution is to analyse practical aspects of the use ...

Please sign up or login with your details

Forgot password? Click here to reset