Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming

12/04/2020
by   J. Liu, et al.
0

In medical fields, text classification is one of the most important tasks that can significantly reduce human workload through structured information digitization and intelligent decision support. Despite the popularity of learning-based text classification techniques, it is hard for human to understand or manually fine-tune the classification results for better precision and recall, due to the black box nature of learning. This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions that can classify a given medical text inquiry with satisfactory precision and recall while allow human to read the classifier and fine-tune accordingly if necessary. Given a seed population of regular expressions (can be randomly initialized or manually constructed by experts), our method evolves a population of regular expressions according to chosen fitness function, using a novel regular expression syntax and a series of carefully chosen reproduction operators. Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance. More importantly, our method generates classifiers that can be fully understood, checked and updated by medical doctors, which are fundamentally crucial for medical related practices.

READ FULL TEXT
research
11/16/2020

Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing and Word-vector Models

In this paper, we propose a rule-based engine composed of high quality a...
research
04/13/2017

A Search for Improved Performance in Regular Expressions

The primary aim of automated performance improvement is to reduce the ru...
research
02/18/2021

Regular Expressions for Fast-response COVID-19 Text Classification

Text classifiers are at the core of many NLP applications and use a vari...
research
04/01/2020

An Improved Classification Model for Igbo Text Using N-Gram And K-Nearest Neighbour Approaches

This paper presents an improved classification model for Igbo text using...
research
12/03/2020

Evolving Character-Level DenseNet Architectures using Genetic Programming

DenseNet architectures have demonstrated impressive performance in image...
research
11/24/2022

Reducing a Set of Regular Expressions and Analyzing Differences of Domain-specific Statistic Reporting

Due to the large amount of daily scientific publications, it is impossib...
research
05/06/2020

Revisiting Regex Generation for Modeling Industrial Applications by Incorporating Byte Pair Encoder

Regular expression is important for many natural language processing tas...

Please sign up or login with your details

Forgot password? Click here to reset