GPTIPS 2: an open-source software platform for symbolic data mining

12/15/2014
by   Dominic P. Searson, et al.
0

GPTIPS is a free, open source MATLAB based software platform for symbolic data mining (SDM). It uses a multigene variant of the biologically inspired machine learning method of genetic programming (MGGP) as the engine that drives the automatic model discovery process. Symbolic data mining is the process of extracting hidden, meaningful relationships from data in the form of symbolic equations. In contrast to other data-mining methods, the structural transparency of the generated predictive equations can give new insights into the physical systems or processes that generated the data. Furthermore, this transparency makes the models very easy to deploy outside of MATLAB. The rationale behind GPTIPS is to reduce the technical barriers to using, understanding, visualising and deploying GP based symbolic models of data, whilst at the same time remaining highly customisable and delivering robust numerical performance for power users. In this chapter, notable new features of the latest version of the software are discussed with these aims in mind. Additionally, a simplified variant of the MGGP high level gene crossover mechanism is proposed. It is demonstrated that the new functionality of GPTIPS 2 (a) facilitates the discovery of compact symbolic relationships from data using multiple approaches, e.g. using novel gene-centric visualisation analysis to mitigate horizontal bloat and reduce complexity in multigene symbolic regression models (b) provides numerous methods for visualising the properties of symbolic models (c) emphasises the generation of graphically navigable libraries of models that are optimal in terms of the Pareto trade off surface of model performance and complexity and (d) expedites real world applications by the simple, rapid and robust deployment of symbolic models outside the software environment they were developed in.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2013

Data Mining using Unguided Symbolic Regression on a Blast Furnace Dataset

In this paper a data mining approach for variable selection and knowledg...
research
04/24/2017

Elite Bases Regression: A Real-time Algorithm for Symbolic Regression

Symbolic regression is an important but challenging research topic in da...
research
06/26/2019

Automatic Discovery of Families of Network Generative Processes

Designing plausible network models typically requires scholars to form a...
research
03/29/2017

Bringing Salary Transparency to the World: Computing Robust Compensation Insights via LinkedIn Salary

The recently launched LinkedIn Salary product has been designed with the...
research
04/12/2021

An Approach to Symbolic Regression Using Feyn

In this article we introduce the supervised machine learning tool called...
research
02/10/2019

ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"

This paper documents the release of the ELKI data mining framework, vers...
research
10/10/2019

PROFET: Construction and Inference of DBNs Based on Mathematical Models

This paper presents, evaluates, and discusses a new software tool to aut...

Please sign up or login with your details

Forgot password? Click here to reset