Guided Semi-Supervised Non-negative Matrix Factorization on Legal Documents

01/31/2022
by   Pengyu Li, et al.
4

Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method through its application to legal documents provided by the California Innocence Project, a nonprofit that works to free innocent convicted persons and reform the justice system. The results show that our proposed method improves both classification accuracy and topic coherence in comparison to past methods like Semi-Supervised Non-negative Matrix Factorization (SSNMF) and Guided Non-negative Matrix Factorization (Guided NMF).

READ FULL TEXT

page 5

page 6

research
08/25/2004

Non-negative matrix factorization with sparseness constraints

Non-negative matrix factorization (NMF) is a recently developed techniqu...
research
06/12/2017

Topic supervised non-negative matrix factorization

Topic models have been extensively used to organize and interpret the co...
research
04/28/2021

Analysis of Legal Documents via Non-negative Matrix Factorization Methods

The California Innocence Project (CIP), a clinical law school program ai...
research
05/08/2014

Improving Image Clustering using Sparse Text and the Wisdom of the Crowds

We propose a method to improve image clustering using sparse text and th...
research
05/26/2022

Federated Non-negative Matrix Factorization for Short Texts Topic Modeling with Mutual Information

Non-negative matrix factorization (NMF) based topic modeling is widely u...
research
02/08/2022

Police Text Analysis: Topic Modeling and Spatial Relative Density Estimation

We analyze a large corpus of police incident narrative documents in unde...
research
09/24/2018

Central Bank Communication and the Yield Curve: A Semi-Automatic Approach using Non-Negative Matrix Factorization

Communication is now a standard tool in the central bank's monetary poli...

Please sign up or login with your details

Forgot password? Click here to reset