Shell Language Processing: Unix command parsing for Machine Learning

07/06/2021
by   Dmitrijs Trizna, et al.
0

In this article, we present a Shell Language Preprocessing (SLP) library, which implements tokenization and encoding directed on the parsing of Unix and Linux shell commands. We describe the rationale behind the need for a new approach with specific examples when conventional Natural Language Processing (NLP) pipelines fail. Furthermore, we evaluate our methodology on a security classification task against widely accepted information and communications technology (ICT) tokenization techniques and achieve significant improvement of an F1-score from 0.392 to 0.874.

READ FULL TEXT

page 1

page 2

page 3

research
02/22/2022

Evaluating Persian Tokenizers

Tokenization plays a significant role in the process of lexical analysis...
research
01/26/2021

Spark NLP: Natural Language Understanding at Scale

Spark NLP is a Natural Language Processing (NLP) library built on top of...
research
09/05/2017

Optimizing for Measure of Performance in Max-Margin Parsing

Many statistical learning problems in the area of natural language proce...
research
08/18/2022

Brand Celebrity Matching Model Based on Natural Language Processing

Celebrity Endorsement is one of the most significant strategies in brand...
research
05/05/2016

Improving Automated Patent Claim Parsing: Dataset, System, and Experiments

Off-the-shelf natural language processing software performs poorly when ...
research
05/31/2019

Using Natural Language Processing to Develop an Automated Orthodontic Diagnostic System

We work on the task of automatically designing a treatment plan from the...
research
11/12/2018

An Introductory Survey on Attention Mechanisms in NLP Problems

First derived from human intuition, later adapted to machine translation...

Please sign up or login with your details

Forgot password? Click here to reset