Word-level Human Interpretable Scoring Mechanism for Novel Text Detection Using Tsetlin Machines

05/10/2021
by   Bimal Bhattarai, et al.
0

Recent research in novelty detection focuses mainly on document-level classification, employing deep neural networks (DNN). However, the black-box nature of DNNs makes it difficult to extract an exact explanation of why a document is considered novel. In addition, dealing with novelty at the word-level is crucial to provide a more fine-grained analysis than what is available at the document level. In this work, we propose a Tsetlin machine (TM)-based architecture for scoring individual words according to their contribution to novelty. Our approach encodes a description of the novel documents using the linguistic patterns captured by TM clauses. We then adopt this description to measure how much a word contributes to making documents novel. Our experimental results demonstrate how our approach breaks down novelty into interpretable phrases, successfully measuring novelty.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2020

Measuring the Novelty of Natural Language Text Using the Conjunctive Clauses of a Tsetlin Machine Text Classifier

Most supervised text classification approaches assume a closed world, co...
research
02/20/2018

TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection

Detecting novelty of an entire document is an Artificial Intelligence (A...
research
10/31/2022

Semantic Novelty Detection and Characterization in Factual Text Involving Named Entities

Much of the existing work on text novelty detection has been studied at ...
research
03/01/2021

BERT based patent novelty search by training claims to their own description

In this paper we present a method to concatenate patent claims to their ...
research
11/11/2018

Learning Groupwise Scoring Functions Using Deep Neural Networks

While in a classification or a regression setting a label or a value is ...
research
11/24/2018

Novelty and Coverage in context-based information filtering

We present a collection of algorithms to filter a stream of documents in...
research
12/02/2019

Learning Word Ratings for Empathy and Distress from Document-Level User Responses

Despite the excellent performance of black box approaches to modeling se...

Please sign up or login with your details

Forgot password? Click here to reset