A general methodology to assess symbolic regression algorithms using the generation of random equations with uniform random sampling

06/18/2019
by   Sohrab Towfighi, et al.
0

Symbolic regression is the act of determining the ideal equation to fit a given dataset. Symbolic regression problems are typically solved using genetic algorithms. Being a metaheuristic approach to global optimization, genetic algorithms were previously conceived as a panacea solution to most computational problems. The paper presents a methodology to compare symbolic regression algorithms. The combinatorics of the problem space is explored and a novel method is described that allows users to count the number of possible equations in a defined problem space. The generation of full binary trees is discussed using a little known but remarkably simple dense enumeration which maps integers to unique binary trees. Though the set of all possible equations is infinite, the total number of equations is finite and specified once we limit our search to N binary trees, n functions, and m terminals. We provide a methodology to do uniform random sampling from this large but finite set of equations. We examine whether a simple evolutionary algorithm outperforms random search using thousands of randomly generated experiments and leverage arguments from elementary statistics. The methodology is generalizable and can be applied to compare symbolic regression algorithms.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro