New models for symbolic data analysis

09/11/2018
by   Boris Beranger, et al.
0

Symbolic data analysis (SDA) is an emerging area of statistics based on aggregating individual level data into group-based distributional summaries (symbols), and then developing statistical methods to analyse them. It is ideal for analysing large and complex datasets, and has immense potential to become a standard inferential technique in the near future. However, existing SDA techniques are either non-inferential, do not easily permit meaningful statistical models, are unable to distinguish between competing models, and are based on simplifying assumptions that are known to be false. Further, the procedure for constructing symbols from the underlying data is erroneously not considered relevant to the resulting statistical analysis. In this paper we introduce a new general method for constructing likelihood functions for symbolic data based on a desired probability model for the underlying classical data, while only observing the distributional summaries. This approach resolves many of the conceptual and practical issues with current SDA methods, opens the door for new classes of symbol design and construction, in addition to developing SDA as a viable tool to enable and improve upon classical data analyses, particularly for very large and complex datasets. This work creates a new direction for SDA research, which we illustrate through several real and simulated data analyses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2019

Composite likelihood methods for histogram-valued random variables

Symbolic data analysis has been proposed as a technique for summarising ...
research
08/31/2020

Likelihood-based inference for modelling packet transit from thinned flow summaries

The substantial growth of network traffic speed and volume presents prac...
research
04/02/2021

Distributional data analysis with accelerometer data in a NHANES database with nonparametric survey regression models

Accelerometers enable an objective measurement of physical activity leve...
research
02/02/2017

Symbolic, Distributed and Distributional Representations for Natural Language Processing in the Era of Deep Learning: a Survey

Natural language and symbols are intimately correlated. Recent advances ...
research
08/18/2020

Glucodensities: a new representation of glucose profiles using distributional data analysis

Biosensor data has the potential ability to improve disease control and ...
research
07/24/2020

New clustering approach for symbolic polygonal data: application to the clustering of entrepreneurial regimes

Entrepreneurial regimes are topic, receiving ever more research attentio...
research
03/08/2023

Models of symbol emergence in communication: a conceptual review and a guide for avoiding local minima

Computational simulations are a popular method for testing hypotheses ab...

Please sign up or login with your details

Forgot password? Click here to reset