Automata Learning: An Algebraic Approach
We propose a generic categorical framework for learning unknown formal languages of various types (e.g. finite or infinite words, trees, weighted and nominal languages). Our approach is parametric in a monad T that represents the given type of languages and their recognizing algebraic structures. Using the concept of an automata presentation of T-algebras, we demonstrate that the task of learning a T-recognizable language can be reduced to learning an abstract form of automaton, which is achieved via a generalized version of Angluin's L* algorithm. The algorithm is phrased in terms of categorically described extension steps; we provide for a generic termination and complexity analysis based on a dedicated notion of finiteness. Our framework applies to structures like tree languages or omega-regular languages that were not within the scope of existing categorical accounts of automata learning. In addition, it yields new generic learning algorithms for several types of languages for which no such algorithms were previously known at all, including sorted languages, nominal languages with name binding, and cost functions.
READ FULL TEXT