Declarative Statistical Modeling with Datalog

12/06/2014
by   Vince Bárány, et al.
0

Formalisms for specifying statistical models, such as probabilistic-programming languages, typically consist of two components: a specification of a stochastic process (the prior), and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence. We propose and investigate a declarative framework for specifying statistical models on top of a database, through an appropriate extension of Datalog. By virtue of extending Datalog, our framework offers a natural integration with the database, and has a robust declarative semantics. Our Datalog extension provides convenient mechanisms to include numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program; these outcomes are minimal solutions with respect to a related program with existentially quantified variables in conclusions. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. We focus on programs that use discrete numerical distributions, but even then the space of possible outcomes may be uncountable (as a solution can be infinite). We define a probability measure over possible outcomes by applying the known concept of cylinder sets to a probabilistic chase procedure. We show that the resulting semantics is robust under different chases. We also identify conditions guaranteeing that all possible outcomes are finite (and then the probability space is discrete). We argue that the framework we propose retains the purely declarative nature of Datalog, and allows for natural specifications of statistical models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2022

Generative Datalog with Stable Negation

Extending programming languages with stochastic behaviour such as probab...
research
01/17/2020

Generative Datalog with Continuous Distributions

Arguing for the need to combine declarative and probabilistic programmin...
research
04/24/2018

Measuring and Computing Database Inconsistency via Repairs

We propose a generic numerical measure of inconsistency of a database wi...
research
08/03/2013

Measure Transformer Semantics for Bayesian Machine Learning

The Bayesian approach to machine learning amounts to computing posterior...
research
01/09/2021

Paradoxes of Probabilistic Programming

Probabilistic programming languages allow programmers to write down cond...
research
02/13/2013

Probabilistic Disjunctive Logic Programming

In this paper we propose a framework for combining Disjunctive Logic Pro...
research
09/13/2023

Pearl's and Jeffrey's Update as Modes of Learning in Probabilistic Programming

The concept of updating a probability distribution in the light of new e...

Please sign up or login with your details

Forgot password? Click here to reset