Typesafe Coordinate Systems in High-Throughput Sequencing Applications

09/14/2022
by   Charles Thomas Gregory, et al.
0

High-throughput sequencing file formats and tools encode coordinate intervals with respect to a reference sequence in at least four distinct, incompatible ways. Integrating data from and moving data between different formats has the potential to introduce subtle off-by-one errors. Here, we introduce the notion of typesafe coordinates: coordinate intervals are not only an integer pair, but members of a type class comprising four types: the Cartesian product of a zero or one basis, and an open or closed interval end. By leveraging the type system of statically and strongly-typed, compiled languages we can provide static guarantees that an entire class of error is eliminated. We provide a reference implementation in D as part of a larger work (dhtslib), and proofs of concept in Rust, OCaml, and Python. Exploratory implementations are available at https://github.com/blachlylab/typesafe-coordinates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2023

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

The high computational and memory requirements of large language model (...
research
12/23/2021

PyCIL: A Python Toolbox for Class-Incremental Learning

Traditional machine learning systems are deployed under the closed-world...
research
05/25/2023

Type Prediction With Program Decomposition and Fill-in-the-Type Training

TypeScript and Python are two programming languages that support optiona...
research
04/17/2023

Lossy Compressor preserving variant calling through Extended BWT

A standard format used for storing the output of high-throughput sequenc...
research
09/05/2019

Estimation and inference in metabolomics with non-random missing data and latent factors

High throughput metabolomics data are fraught with both non-ignorable mi...
research
07/06/2021

Rethinking Positional Encoding

It is well noted that coordinate based MLPs benefit greatly – in terms o...
research
03/27/2023

Multiphysics discovery with moving boundaries using Ensemble SINDy and Peridynamic Differential Operator

This study proposes a novel framework for learning the underlying physic...

Please sign up or login with your details

Forgot password? Click here to reset