Is there something I'm missing? Topic Modeling in eDiscovery

07/30/2020
by   Herbert L. Roitblat, et al.
0

In legal eDiscovery, the parties are required to search through their electronically stored information to find documents that are relevant to a specific case. Negotiations over the scope of these searches are often based on a fear that something will be missed. This paper continues an argument that discovery should be based on identifying the facts of a case. If a search process is less than complete (if it has Recall less than 100 be complete in presenting all of the relevant available topics. In this study, Latent Dirichlet Allocation was used to identify 100 topics from all of the known relevant documents. The documents were then categorized to about 80 Recall (i.e., 80 designated the hit set and 20 the fact that less than all of the relevant documents were identified by the categorizer, the documents that were identified contained all of the topics derived from the full set of documents. This same pattern held whether the categorizer was a naïve Bayes categorizer trained on a random selection of documents or a Support Vector Machine trained with Continuous Active Learning (which focuses evaluation on the most-likely-to-be-relevant documents). No topics were identified in either categorizer's missed set that were not already seen in the hit set. Not only is a computer-assisted search process reasonable (as required by the Federal Rules of Civil Procedure), it is also complete when measured by topics.

READ FULL TEXT

page 6

page 8

page 10

research
09/16/2021

FOMO: Topics versus documents in legal eDiscovery

In the United States, the parties to a lawsuit are required to search th...
research
01/28/2022

Probably Reasonable Search in eDiscovery

In eDiscovery, a party to a lawsuit or similar action must search throug...
research
10/12/2018

Technology Assisted Reviews: Finding the Last Few Relevant Documents by Asking Yes/No Questions to Reviewers

The goal of a technology-assisted review is to achieve high recall with ...
research
11/25/2019

Discovering topics with neural topic models built from PLSA assumptions

In this paper we present a model for unsupervised topic discovery in tex...
research
03/31/2021

Topic Scaling: A Joint Document Scaling – Topic Model Approach To Learn Time-Specific Topics

This paper proposes a new methodology to study sequential corpora by imp...
research
12/28/2022

Choosing the Number of Topics in LDA Models – A Monte Carlo Comparison of Selection Criteria

Selecting the number of topics in LDA models is considered to be a diffi...
research
06/18/2021

Heuristic Stopping Rules For Technology-Assisted Review

Technology-assisted review (TAR) refers to human-in-the-loop active lear...

Please sign up or login with your details

Forgot password? Click here to reset