Dynamic Kernel Matching for Non-conforming Data: A Case Study of T-cell Receptor Datasets

03/18/2021
by   Jared Ostmeyer, et al.
0

Most statistical classifiers are designed to find patterns in data where numbers fit into rows and columns, like in a spreadsheet, but many kinds of data do not conform to this structure. To uncover patterns in non-conforming data, we describe an approach for modifying established statistical classifiers to handle non-conforming data, which we call dynamic kernel matching (DKM). As examples of non-conforming data, we consider (i) a dataset of T-cell receptor (TCR) sequences labelled by disease antigen and (ii) a dataset of sequenced TCR repertoires labelled by patient cytomegalovirus (CMV) serostatus, anticipating that both datasets contain signatures for diagnosing disease. We successfully fit statistical classifiers augmented with DKM to both datasets and report the performance on holdout data using standard metrics and metrics allowing for indeterminant diagnoses. Finally, we identify the patterns used by our statistical classifiers to generate predictions and show that these patterns agree with observations from experimental studies.

READ FULL TEXT

page 2

page 17

page 23

page 24

research
03/29/2019

Statistical matching of non-Gaussian data

The statistical matching problem is a data integration problem with stru...
research
10/25/2019

Toward a better trade-off between performance and fairness with kernel-based distribution matching

As recent literature has demonstrated how classifiers often carry uninte...
research
02/23/2022

Amortised Likelihood-free Inference for Expensive Time-series Simulators with Signatured Ratio Estimation

Simulation models of complex dynamics in the natural and social sciences...
research
08/20/2023

Cell Spatial Analysis in Crohn's Disease: Unveiling Local Cell Arrangement Pattern with Graph-based Signatures

Crohn's disease (CD) is a chronic and relapsing inflammatory condition t...
research
10/01/2018

Integrated Principal Components Analysis

Data integration, or the strategic analysis of multiple sources of data ...
research
03/19/2019

Characterization of the Handwriting Skills as a Biomarker for Parkinson Disease

In this paper we evaluate the suitability of handwriting patterns as pot...
research
01/25/2019

Finding Archetypal Spaces for Data Using Neural Networks

Archetypal analysis is a type of factor analysis where data is fit by a ...

Please sign up or login with your details

Forgot password? Click here to reset