Outlier Detection for Text Data : An Extended Version

01/05/2017
by   Ramakrishnan Kannan, et al.
0

The problem of outlier detection is extremely challenging in many domains such as text, in which the attribute values are typically non-negative, and most values are zero. In such cases, it often becomes difficult to separate the outliers from the natural variations in the patterns in the underlying data. In this paper, we present a matrix factorization method, which is naturally able to distinguish the anomalies with the use of low rank approximations of the underlying data. Our iterative algorithm TONMF is based on block coordinate descent (BCD) framework. We define blocks over the term-document matrix such that the function becomes solvable. Given most recently updated values of other matrix blocks, we always update one block at a time to its optimal. Our approach has significant advantages over traditional methods for text outlier detection. Finally, we present experimental results illustrating the effectiveness of our method over competing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2020

A Block Coordinate Descent-based Projected Gradient Algorithm for Orthogonal Non-negative Matrix Factorization

This article utilizes the projected gradient method (PG) for a non-negat...
research
05/17/2017

REMIX: Automated Exploration for Interactive Outlier Detection

Outlier detection is the identification of points in a dataset that do n...
research
06/02/2019

Truncated Cauchy Non-negative Matrix Factorization

Non-negative matrix factorization (NMF) minimizes the Euclidean distance...
research
06/01/2014

l_1-regularized Outlier Isolation and Regression

This paper proposed a new regression model called l_1-regularized outlie...
research
02/25/2018

DID: Distributed Incremental Block Coordinate Descent for Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) has attracted much attention in t...
research
04/12/2017

Provable Self-Representation Based Outlier Detection in a Union of Subspaces

Many computer vision tasks involve processing large amounts of data cont...

Please sign up or login with your details

Forgot password? Click here to reset