Toward more generalized Malicious URL Detection Models

02/21/2022
by   YunDa Tsai, et al.
0

This paper reveals a data bias issue that can severely affect the performance while conducting a machine learning model for malicious URL detection. We describe how such bias can be identified using interpretable machine learning techniques, and further argue that such biases naturally exist in the real world security data for training a classification model. We then propose a debiased training strategy that can be applied to most deep-learning based models to alleviate the negative effects from the biased features. The solution is based on the technique of self-supervised adversarial training to train deep neural networks learning invariant embedding from biased data. We conduct a wide range of experiments to demonstrate that the proposed strategy can lead to significantly better generalization capability for both CNN-based and RNN-based detection models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2020

Picket: Self-supervised Data Diagnostics for ML Pipelines

Data corruption is an impediment to modern machine learning deployments....
research
04/14/2020

InsideBias: Measuring Bias in Deep Networks and Application to Face Gender Biometrics

This work explores the biases in learning processes based on deep neural...
research
10/31/2017

Calibration for Stratified Classification Models

In classification problems, sampling bias between training data and test...
research
03/10/2021

Towards Learning an Unbiased Classifier from Biased Data via Conditional Adversarial Debiasing

Bias in classifiers is a severe issue of modern deep learning methods, e...
research
12/26/2018

Learning Not to Learn: Training Deep Neural Networks with Biased Data

We propose a novel regularization algorithm to train deep neural network...
research
10/11/2022

Self-supervised debiasing using low rank regularization

Spurious correlations can cause strong biases in deep neural networks, i...
research
12/23/2021

Towards identifying optimal biased feedback for various user states and traits in motor imagery BCI

Objective. Neural self-regulation is necessary for achieving control ove...

Please sign up or login with your details

Forgot password? Click here to reset