1 SNV cycles
For this article, let be a finite distance space, i.e. is a finite set and is a metric or more generally a semimetric222A semimetric satisfies all the axioms of a metric with exception of the triangle inequality. on . Recall that for every , the Vietoris-Rips complex of at scale is the abstract simplicial complex
Assume that we have a (time dependent) filtration
For , consider the Vietoris-Rips filtration
Denote by the first simplicial homology with coefficients in a finite prime field applied to the filtration . Then is a finitely generated (f.g) one-dimensional persistence module.
As in the work of Bleher et al. [topologyidentifies2021] where is a finite set of SARS-CoV-2 RNA sequences with Hamming distance , we are interested in detecting cycles that correspond to bars in the barcode born in the first filtration step. In [topologyidentifies2021], these cycles are called single nucleotide variation (SNV) cycles and are used for a topological recurrence (time series) analysis of SARS-CoV-2. For simplicity, we also call such cycles SNV cycles in our more general setting.
Definition 1.1 (SNV cycle).
The underlying homology class representatives of bars in the barcode born in the first filtration step are called SNV cycles in time step .
For every , denote by a full set of SNV cycle representatives extracted from the barcode . In [topologyidentifies2021], the barcodes are computed with Ripser [bauer2021ripser] and the are extracted from the Ripser output. Ripser is a highly optimised software tool, capable of processing hundreds of thousands of distinct RNA sequences [topologyidentifies2021]. However, this classical approach to a time series analysis has the following issues:
Computing each time step seperately can be very time consuming for large (e.g. a time series analysis over one year on a daily basis).
We are not able to track the time-stability of SNV cycles, i.e. whether the image of the homology class of an SNV cycle under the canonical homomorphism
is zero or not.
Since each time step is computed seperately, the are not automatically compatible: let and assume that the image of under the canonical homomorphism
is not zero. Then it still may happen that .
In Sections 2 and 3, we present a method that enables the extraction of SNV cycles for each time step with only one barcode computation. The resulting SNV cycles are automatically compatible and we can track their time-stability.
2 Dimension reduction
The naturally lead to a finite bifiltered simplicial complex . We obtain a f.g. two-dimensional persistence module which contains all the information that occur within the . Moreover, contains additional information about the behaviour of homology classes along the time filtration parameter. Since we are only interested in detecting SNV cycles and not in determining their lifespan in the barcodes , it suffices to compute the barcode where is the one-dimensional subfiltration
For reasons of notation, we start with . The f.g. one-dimensional persistence module can be viewed as a dimensional reduction of . The barcode contains all the information we need to extract SNV cycles for each time step . Moreover, tracks the stability of SNV cycles along the time filtration parameter.
The idea to consider barcodes of subfiltrations follows a more general concept introduced by Carrie et al. [Bettinumbersmultipers] and called fibered barcode by Lesnick and Wright [lesnick2015interactive]. Fibered barcodes are closely related to the rank invariant introduced by Carlsson and Zomorodian in [Carlsson2009multidimensionalpersistence]. In [Bettinumbersmultipers], it is shown that the fibered barcode and the rank invariant determine each other.
3 Distance deformation
In this section, we introduce a distance deformation technique to realise as a Vietoris-Rips filtration such that we have a correspondence between the barcodes and for the bars corresponding to SNV cycles.
For the following, let be the lowest power of such that . For example, if , then . For , let
Definition 3.1 (Distance deformation).
We define a new distance on as follows: let with . Define
The intuition behind is that time information is transformed into distances. Let . Then we have . Let with and . Assume that . Then we have
Consider the Vietoris-Rips filtration , where for ,
with filtration parameters
Then is a f.g. one-dimensional persistence module. By construction, we have the following correspondence (illustrated in Figure 1).
Consider the barcodes and . Let . Then bars born in are in one to one correspondence with bars born in . Let . If a bar born in dies in , the corresponding bar born in dies in .
Using this correspondence, the definition of SNV cycles translates as follows.
Definition 3.4 (Deformed SNV cycle).
The underlying homology class representatives of bars in the barcode born in are called deformed SNV cycles.
Denote by a full set of deformed SNV cycle representatives extracted from . For , define
By construction, we have a bijection of sets
Moreover, we have compatibility: let and assume that the image of under the canonical homomorphism
is not zero. Then by construction. In addition, we can track the time-stability of SNV cycles and instead of barcode computations of for , only the computation of has to be performed. Since is a Vietoris-Rips filtration, the barcode can be computed with Ripser [bauer2021ripser]. In practical experiments, one could investigate whether this new method provides a performance advantage over the classical approach to a time series analysis, where each time step is computed seperately.