Botcha: Detecting Malicious Non-Human Traffic in the Wild

03/02/2021
by   Sunny Dhamnani, et al.
0

Malicious bots make up about a quarter of all traffic on the web, and degrade the performance of personalization and recommendation algorithms that operate on e-commerce sites. Positive-Unlabeled learning (PU learning) provides the ability to train a binary classifier using only positive (P) and unlabeled (U) instances. The unlabeled data comprises of both positive and negative classes. It is possible to find labels for strict subsets of non-malicious actors, e.g., the assumption that only humans purchase during web sessions, or clear CAPTCHAs. However, finding signals of malicious behavior is almost impossible due to the ever-evolving and adversarial nature of bots. Such a set-up naturally lends itself to PU learning. Unfortunately, standard PU learning approaches assume that the labeled set of positives are a random sample of all positives, this is unlikely to hold in practice. In this work, we propose two modifications to PU learning that make it more robust to violations of the selected-completely-at-random assumption, leading to a system that can filter out malicious bots. In one public and one proprietary dataset, we show that proposed approaches are better at identifying humans in web data than standard PU learning methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2023

PULSNAR – Positive unlabeled learning selected not at random: class proportion estimation when the SCAR assumption does not hold

Positive and Unlabeled (PU) learning is a type of semi-supervised binary...
research
08/27/2018

Learning from Positive and Unlabeled Data under the Selected At Random Assumption

For many interesting tasks, such as medical diagnosis and web page class...
research
04/26/2015

Assessing binary classifiers using only positive and unlabeled data

Assessing the performance of a learned model is a crucial part of machin...
research
02/02/2017

Recovering True Classifier Performance in Positive-Unlabeled Learning

A common approach in positive-unlabeled learning is to train a classific...
research
09/10/2018

Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data

Most positive and unlabeled data is subject to selection biases. The lab...
research
10/15/2019

Learning Classifiers on Positive and Unlabeled Data with Policy Gradient

Existing algorithms aiming to learn a binary classifier from positive (P...
research
02/13/2014

A Robust Ensemble Approach to Learn From Positive and Unlabeled Data Using SVM Base Models

We present a novel approach to learn binary classifiers when only positi...

Please sign up or login with your details

Forgot password? Click here to reset