A novel, computationally tractable algorithm flags in big matrices every column associated in any way with others or a dependent variable, with much higher power when columns a

02/21/2022
by   Marcos A. Antezana, et al.
0

Scanning exhaustively a big data matrix DM for subsets of independent variables IVs that are associated with a dependent variable DV is computationally tractable only for 1- and 2-IV effects. I present a highly computationally tractable Participation-In-Association Score (PAS) that in a DM with markers flags every column that is strongly associated with others. PAS examines no column subsets and its computational cost grows linearly with DM columns, remaining reasonable even in million-column DMs. PAS exploits how associations of markers in DM rows cause matches associations in the rows' pairwise comparisons. For every such comparison with a match at a tested column, PAS computes the other matches by modifying the comparison's total matches (scored once per DM), yielding a distribution of conditional matches that is perturbed by associations of the tested column. Equally tractable is dvPAS that flags DV-associated IVs by permuting the markers in the DV. P values are obtained by permutation and Sidak-corrected for multiple tests, bypassing model selection. Simulations show that i) PAS and dvPAS generate uniform-(0,1)-distributed type I error in null DMs and ii) detect randomly encountered binary and trinary models of significant n-column association and n-IV association with a binary DV, respectively, with power in the order of magnitude of exhaustive evaluation's and false positives that are uniform-(0,1)-distributed or straightforwardly tuned to be so. Power to detect 2-way DV-associated 100-marker+ runs is non-parametrically ultimate but that to detect pure n-column associations and pure n-IV DV associations sinks exponentially as n increases. Power increases about twofold in trinary vs. binary DMs and in a major way when there are background associations like between mutations in chromosomes, specially in trinary DMs where dvPAS filters said background most effectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/31/2020

Denoising individual bias for a fairer binary submatrix detection

Low rank representation of binary matrix is powerful in disentangling sp...
research
09/01/2022

ByteStore: Hybrid Layouts for Main-Memory Column Stores

The performance of main memory column stores highly depends on the scan ...
research
10/18/2016

Going off the Grid: Iterative Model Selection for Biclustered Matrix Completion

We consider the problem of performing matrix completion with side inform...
research
03/14/2023

Asymptotically Sharp Upper Bound for the Column Subset Selection Problem

This paper investigates the spectral norm version of the column subset s...
research
05/16/2020

Circulant almost cross intersecting families

Let ℱ and 𝒢 be two t-uniform families of subsets over [k] = {1,2,...,k},...
research
06/03/2022

A Deep Reinforcement Learning Framework For Column Generation

Column Generation (CG) is an iterative algorithm for solving linear prog...

Please sign up or login with your details

Forgot password? Click here to reset