Explaining Differences in Classes of Discrete Sequences

11/06/2020
by   Samaneh Saadat, et al.
0

While there are many machine learning methods to classify and cluster sequences, they fail to explain what are the differences in groups of sequences that make them distinguishable. Although in some cases having a black box model is sufficient, there is a need for increased explainability in research areas focused on human behaviors. For example, psychologists are less interested in having a model that predicts human behavior with high accuracy and more concerned with identifying differences between actions that lead to divergent human behavior. This paper presents techniques for understanding differences between classes of discrete sequences. Approaches introduced in this paper can be utilized to interpret black box machine learning models on sequences. The first approach compares k-gram representations of sequences using the silhouette score. The second method characterizes differences by analyzing the distance matrix of subsequences. As a case study, we trained black box supervised learning methods to classify sequences of GitHub teams and then utilized our sequence analysis techniques to measure and characterize differences between event sequences of teams with bots and teams without bots. In our second case study, we classified Minecraft event sequences to infer their high-level actions and analyzed differences between low-level event sequences of actions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2019

How Much Can We See? A Note on Quantifying Explainability of Machine Learning Models

One of the most popular approaches to understanding feature effects of m...
research
12/31/2022

Action Codes

We provide a new perspective on the problem how high-level state machine...
research
02/19/2019

Explaining a black-box using Deep Variational Information Bottleneck Approach

Briefness and comprehensiveness are necessary in order to give a lot of ...
research
06/05/2020

Population-Based Black-Box Optimization for Biological Sequence Design

The use of black-box optimization for the design of new biological seque...
research
09/10/2020

Actionable Interpretation of Machine Learning Models for Sequential Data: Dementia-related Agitation Use Case

Machine learning has shown successes for complex learning problems in wh...
research
04/25/2022

Investigating Black-Box Function Recognition Using Hardware Performance Counters

This paper presents new methods and results for learning information abo...
research
03/23/2021

Binary disease prediction using tail quantiles of the distribution of continuous biomarkers

In the analysis of binary disease classification, single biomarkers migh...

Please sign up or login with your details

Forgot password? Click here to reset