Using the Tsetlin Machine to Learn Human-Interpretable Rules for High-Accuracy Text Categorization with Medical Applications

09/12/2018
by   Geir Thore Berge, et al.
0

Medical applications challenge today's text categorization techniques by demanding both high accuracy and ease-of-interpretation. Although deep learning has provided a leap ahead in accuracy, this leap comes at the sacrifice of interpretability. To address this accuracy-interpretability challenge, we here introduce, for the first time, a text categorization approach that leverages the recently introduced Tsetlin Machine. In all brevity, we represent the terms of a text as propositional variables. From these, we capture categories using simple propositional formulae, such as: if "rash" and "reaction" and "penicillin" then Allergy. The Tsetlin Machine learns these formulae from a labelled text, utilizing conjunctive clauses to represent the particular facets of each category. Indeed, even the absence of terms (negated features) can be used for categorization purposes. Our empirical results are quite conclusive. The Tsetlin Machine either performs on par with or outperforms all of the evaluated methods on both the 20 Newsgroups and IMDb datasets, as well as on a non-public clinical dataset. On average, the Tsetlin Machine delivers the best recall and precision scores across the datasets. The GPU implementation of the Tsetlin Machine is further 8 times faster than the GPU implementation of the neural network. We thus believe that our novel approach can have a significant impact on a wide range of text analysis applications, forming a promising starting point for deeper natural language understanding with the Tsetlin Machine.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2022

Learning Mutual Fund Categorization using Natural Language Processing

Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long...
research
09/13/2023

Beyond original Research Articles Categorization via NLP

This work proposes a novel approach to text categorization – for unknown...
research
04/22/2017

Medical Text Classification using Convolutional Neural Networks

We present an approach to automatically classify clinical text at a sent...
research
02/22/2021

A Relational Tsetlin Machine with Applications to Natural Language Understanding

TMs are a pattern recognition approach that uses finite state machines f...
research
10/27/2016

A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks

Sarcasm detection is a key task for many natural language processing tas...
research
03/27/2013

Machine Generalization and Human Categorization: An Information-Theoretic View

In designing an intelligent system that must be able to explain its reas...
research
04/04/2018

The Tsetlin Machine - A Game Theoretic Bandit Driven Approach to Optimal Pattern Recognition with Propositional Logic

Although simple individually, artificial neurons provide state-of-the-ar...

Please sign up or login with your details

Forgot password? Click here to reset