A Study of the Learnability of Relational Properties (Model Counting Meets Machine Learning)

12/25/2019
by   Muhammad Usman, et al.
0

Relational properties, e.g., the connectivity structure of nodes in a distributed system, have many applications in software design and analysis. However, such properties often have to be written manually, which can be costly and error-prone. This paper introduces the MCML approach for empirically studying the learnability of a key class of such properties that can be expressed in the well-known software design language Alloy. A key novelty of MCML is quantification of the performance of and semantic differences among trained machine learning (ML) models, specifically decision trees, with respect to entire input spaces (up to a bound on the input size), and not just for given training and test datasets (as is the common practice). MCML reduces the quantification problems to the classic complexity theory problem of model counting, and employs state-of-the-art approximate and exact model counters for high efficiency. The results show that relatively simple ML models can achieve surprisingly high performance (accuracy and F1 score) at learning relational properties when evaluated in the common setting of using training and test datasets – even when the training dataset is much smaller than the test dataset – indicating the seeming simplicity of learning these properties. However, the use of MCML metrics based on model counting shows that the performance can degrade substantially when tested against the whole (bounded) input space, indicating the high complexity of precisely learning these properties, and the usefulness of model counting in quantifying the true accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2021

Graph-Based Machine Learning Improves Just-in-Time Defect Prediction

The increasing complexity of today's software requires the contribution ...
research
10/28/2022

Estimating oil recovery factor using machine learning: Applications of XGBoost classification

In petroleum engineering, it is essential to determine the ultimate reco...
research
03/15/2022

Approximate Decision Trees For Machine Learning Classification on Tiny Printed Circuits

Although Printed Electronics (PE) cannot compete with silicon-based syst...
research
08/27/2022

Information FOMO: The unhealthy fear of missing out on information. A method for removing misleading data for healthier models

Not all data are equal. Misleading or unnecessary data can critically hi...
research
02/08/2023

Two-step hyperparameter optimization method: Accelerating hyperparameter search by using a fraction of a training dataset

Hyperparameter optimization (HPO) can be an important step in machine le...
research
12/22/2021

Classifier Data Quality: A Geometric Complexity Based Method for Automated Baseline And Insights Generation

Testing Machine Learning (ML) models and AI-Infused Applications (AIIAs)...
research
01/24/2019

Using CycleGANs for effectively reducing image variability across OCT devices and improving retinal fluid segmentation

Optical coherence tomography (OCT) has become the most important imaging...

Please sign up or login with your details

Forgot password? Click here to reset