Empirical Comparisons of CNN with Other Learning Algorithms for Text Classification in Legal Document Review

12/19/2019
by   Robert Keeling, et al.
0

Research has shown that Convolutional Neural Networks (CNN) can be effectively applied to text classification as part of a predictive coding protocol. That said, most research to date has been conducted on data sets with short documents that do not reflect the variety of documents in real world document reviews. Using data from four actual reviews with documents of varying lengths, we compared CNN with other popular machine learning algorithms for text classification, including Logistic Regression, Support Vector Machine, and Random Forest. For each data set, classification models were trained with different training sample sizes using different learning algorithms. These models were then evaluated using a large randomly sampled test set of documents, and the results were compared using precision and recall curves. Our study demonstrates that CNN performed well, but that there was no single algorithm that performed the best across the combination of data sets and training sample sizes. These results will help advance research into the legal profession's use of machine learning algorithms that maximize performance.

READ FULL TEXT
research
04/03/2019

Empirical Study of Deep Learning for Text Classification in Legal Document Review

Predictive coding has been widely used in legal matters to find relevant...
research
12/19/2019

A Framework for Explainable Text Classification in Legal Document Review

Companies regularly spend millions of dollars producing electronically-s...
research
04/03/2019

Explainable Text Classification in Legal Document Review A Case Study of Explainable Predictive Coding

In today's legal environment, lawsuits and regulatory investigations req...
research
10/25/2017

Re-evaluating the need for Modelling Term-Dependence in Text Classification Problems

A substantial amount of research has been carried out in developing mach...
research
02/27/2018

Convolutional Neural Networks for Toxic Comment Classification

Flood of information is produced in a daily basis through the global Int...
research
03/21/2019

Empirical Evaluations of Seed Set Selection Strategies for Predictive Coding

Training documents have a significant impact on the performance of predi...
research
02/09/2021

CNN Application in Detection of Privileged Documents in Legal Document Review

Protecting privileged communications and data from disclosure is paramou...

Please sign up or login with your details

Forgot password? Click here to reset