MLRegTest: A Benchmark for the Machine Learning of Regular Languages

04/16/2023
by   Sam van der Poel, et al.
0

Evaluating machine learning (ML) systems on their ability to learn known classifiers allows fine-grained examination of the patterns they can learn, which builds confidence when they are applied to the learning of unknown classifiers. This article presents a new benchmark for ML systems on sequence classification called MLRegTest, which contains training, development, and test sets from 1,800 regular languages. Different kinds of formal languages represent different kinds of long-distance dependencies, and correctly identifying long-distance dependencies in sequences is a known challenge for ML systems to generalize successfully. MLRegTest organizes its languages according to their logical complexity (monadic second order, first order, propositional, or monomial expressions) and the kind of logical literals (string, tier-string, subsequence, or combinations thereof). The logical complexity and choice of literal provides a systematic way to understand different kinds of long-distance dependencies in regular languages, and therefore to understand the capacities of different ML systems to learn such long-distance dependencies. Finally, the performance of different neural networks (simple RNN, LSTM, GRU, transformer) on MLRegTest is examined. The main conclusion is that their performance depends significantly on the kind of test set, the class of language, and the neural network architecture.

READ FULL TEXT

page 22

page 24

page 29

research
07/29/2020

Decoding machine learning benchmarks

Despite the availability of benchmark machine learning (ML) repositories...
research
07/13/2019

Multi-Element Long Distance Dependencies: Using SPk Languages to Explore the Characteristics of Long-Distance Dependencies

In order to successfully model Long Distance Dependencies (LDDs) it is n...
research
05/16/2017

Subregular Complexity and Deep Learning

This paper argues that the judicial use of formal language theory and gr...
research
08/15/2023

The Regular Expression Inference Challenge

We propose regular expression inference (REI) as a challenge for code/la...
research
08/21/2019

Tensor Product Representations of Subregular Formal Languages

This paper provides a geometric characterization of subclasses of the re...
research
11/24/2022

Data Origin Inference in Machine Learning

It is a growing direction to utilize unintended memorization in ML model...
research
05/29/2019

Word-order biases in deep-agent emergent communication

Sequence-processing neural networks led to remarkable progress on many N...

Please sign up or login with your details

Forgot password? Click here to reset