Extracting the main trend in a dataset: the Sequencer algorithm

06/24/2020
by   Dalya Baron, et al.
0

Scientists aim to extract simplicity from observations of the complex world. An important component of this process is the exploration of data in search of trends. In practice, however, this tends to be more of an art than a science. Among all trends existing in the natural world, one-dimensional trends, often called sequences, are of particular interest as they provide insights into simple phenomena. However, some are challenging to detect as they may be expressed in complex manners. We present the Sequencer, an algorithm designed to generically identify the main trend in a dataset. It does so by constructing graphs describing the similarities between pairs of observations, computed with a set of metrics and scales. Using the fact that continuous trends lead to more elongated graphs, the algorithm can identify which aspects of the data are relevant in establishing a global sequence. Such an approach can be used beyond the proposed algorithm and can optimize the parameters of any dimensionality reduction technique. We demonstrate the power of the Sequencer using real-world data from astronomy, geology as well as images from the natural world. We show that, in a number of cases, it outperforms the popular t-SNE and UMAP dimensionality reduction techniques. This approach to exploratory data analysis, which does not rely on training nor tuning of any parameter, has the potential to enable discoveries in a wide range of scientific domains. The source code is available on github and we provide an online interface at <http://sequencer.org>.

READ FULL TEXT

page 5

page 6

page 7

page 8

page 14

page 15

research
11/28/2018

A Visual Interaction Framework for Dimensionality Reduction Based Data Exploration

Dimensionality reduction is a common method for analyzing and visualizin...
research
12/07/2014

Dimensionality Reduction with Subspace Structure Preservation

Modeling data as being sampled from a union of independent subspaces has...
research
02/21/2022

Non-Volatile Memory Accelerated Geometric Multi-Scale Resolution Analysis

Dimensionality reduction algorithms are standard tools in a researcher's...
research
11/28/2021

Dimensionality Reduction of Longitudinal 'Omics Data using Modern Tensor Factorization

Precision medicine is a clinical approach for disease prevention, detect...
research
08/25/2019

Unsupervised Construction of Knowledge Graphs From Text and Code

The scientific literature is a rich source of information for data minin...
research
07/02/2017

Dimensionality reduction with missing values imputation

In this study, we propose a new statical approach for high-dimensionalit...
research
10/10/2019

Efficient Sketching Algorithm for Sparse Binary Data

Recent advancement of the WWW, IOT, social network, e-commerce, etc. hav...

Please sign up or login with your details

Forgot password? Click here to reset