Numerical Linear Algebra in the Sliding Window Model

We initiate the study of numerical linear algebra in the sliding window model, where only the most recent W updates in the data stream form the underlying set. Much of the previous work in the sliding window model uses one of two frameworks: either the exponential histogram (Datar et al., SICOMP'02) or the smooth histogram (Braverman and Ostrovsky, FOCS'07). We show that some elementary linear algebraic problems, such as the estimation of matrix ℓ_p norms, can be addressed in the sliding window model using these frameworks. Specifically, we show that approximating matrix entrywise norms for all p < ∞ and Schatten p-norms for p=0,1, and 2 (which include the matrix rank, the trace or nuclear norm, and the Frobenius norm) are suitable for the smooth histogram framework. However, most "interesting" problems are not smooth and specifically, we show that the spectral norm, vector induced matrix norms, generalized regression, low-rank approximation, and leverage scores are not amenable for the smooth histogram framework. To overcome this challenge, we generalize the smooth histogram framework from real-valued functions to matrix-valued functions. We believe that this framework is a natural way to study numerical linear algebra problems in the sliding window model and perhaps beyond. We then apply our framework to derive approximations for the following linear algebraic functions in the sliding window model: Spectral sparsification, Low-rank approximation, Generalized regression, Graph sparsification (and applications), Matrix multiplication, Row-subset selection. In addition to the results presented in this paper, we hope that our new framework will be useful in developing future randomized numerical linear algebra algorithms.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

09/03/2021

Symmetric Norm Estimation and Regression on Sliding Windows

The sliding window model generalizes the standard streaming model and of...
03/15/2021

Smoothness of Schatten Norms and Sliding-Window Matrix Streams

Large matrices are often accessed as a row-order stream. We consider the...
07/06/2018

Leveraging Well-Conditioned Bases: Streaming & Distributed Summaries in Minkowski p-Norms

Work on approximate linear algebra has led to efficient distributed and ...
04/10/2022

Closing the Gap between Weighted and Unweighted Matching in the Sliding Window Model

We consider the Maximum-weight Matching (MWM) problem in the streaming s...
12/07/2019

Flattened Exponential Histogram for Sliding Window Queries over Data Streams

The Basic Counting problem [1] is one of the most fundamental and critic...
01/09/2014

Brazilian License Plate Detection Using Histogram of Oriented Gradients and Sliding Windows

Due to the increasingly need for automatic traffic monitoring, vehicle l...
11/09/2020

Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra

We create classical (non-quantum) dynamic data structures supporting que...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The advent of big data has reinforced efforts to design and analyze algorithms that process data streams, i.e., datasets that are presented as a sequence of items and can be examined in only one or a few passes. In such settings, one of the main research focuses is the streaming model, where data arrives sequentially, can be observed only once, and the proposed algorithms are allowed to use space that is typically sublinear in the size of the input. However, this model does not fully address settings where the data is time-sensitive: such applications include network monitoring [CM05, CG08, Cor13], event detection in social media [OMM14], etc. In such settings recent data is considered more accurate and important than data that arrived prior to a certain time window.

To model such settings, Datar et al.  [DGIM02] introduced the sliding window model, which is parameterized by the size of the window. The parameter represents the size of the “most recent” data, called the active data. It is precisely the active data, in contrast to the so-called “expired data”, that one wishes to concentrate on in subsequent analyses. The objective is to design and analyze algorithms to compute statistics only on the active data using memory that is sublinear in the window size . Since its introduction, much work has studied statistics that can be computed in the sliding window model [AM04, BDM02, BGO14, BO07, BGL18, BOZ12, CLLT12, DGIM02, DM07, FKZ05, LT06a, LT06b, TXB06, ZG08]. The analysis of all the above algorithms uses one of two frameworks: the exponential histogram [DGIM02] or the smooth histogram [BO07]. Unfortunately, despite being quite general, these frameworks cannot be applied to many interesting problems that have been extensively studied in the streaming model, such as clustering, submodular maximization, as well as most interesting numerical linear algebra problems.

While these frameworks can be used for some elementary linear-algebraic problems, such as estimating

norms, we show that many more complicated numerical linear algebra problems, such as spectral approximation, principal component analysis, and generalized linear regression are not amenable to these frameworks. Recently, numerical linear algebra has found applications in many areas of computer science and there is a rich literature on what can be achieved without the sliding window constraints (see 

[Woo14, DM17] for a survey of such approaches). A natural question thus arises:

Are there efficient algorithms for numerical linear algebra in the sliding window model?

In this paper, we initiate the study of numerical linear algebra in the sliding window model. The complexity of approximating various linear-algebraic functions in the streaming model (including determinants, inverses, low-rank approximation, matrix multiplication, and singular value computation) was asked by 

[Mut05] and has been followed by a large body of work and extensive literature presenting tight upper and lower bounds on the achievable guarantees for many numerical linear algebra problems. In this work, we present the first algorithms for spectral approximation, low rank approximation, and approximate matrix multiplication in the sliding window model.

1.1 Our Contributions

Even though a few elementary linear-algebraic problems, such as estimating the Frobenius norm, can be addressed in the sliding window model using the smooth histogram framework [BO07], most of the “interesting” problems are not smooth. We show in Appendix B that various linear-algebraic functions are not smooth according to the definitions of [BO07] and therefore cannot be used in the smooth histogram framework. Namely, we show that the spectral norm, vector induced matrix norms, generalized regression, and low-rank approximation are not amenable to the smooth histogram framework. This motivates the need for new frameworks for problems of linear algebra in the sliding window model.

Our first contribution is a deterministic, space and time-efficient algorithm for spectral approximation in the sliding window model under the assumption that the matrix in the current window has bounded spectrum111Throughout the paper we use the notation to indicate that is a positive semidefinite (PSD) matrix, i.e., for all , ., i.e., , where and are the smallest and largest nonzero singular values respectively. (The reader might want to first consult Appendix A.1 for basic background on the sliding window model and Appendix A for details on notation). For matrices and , we use the notation to represent the matrix . Similarly, we use to represent the matrix , where is a matrix and is a row vector.

Theorem 1.1 (Deterministic Spectral Approximation).

Let be a stream of rows and be the size of the sliding window for some constant . Let be the matrix consisting of the most recent rows and suppose . Then given a parameter , there exists an algorithm that outputs a matrix such that

The algorithm uses space and has amortized update time.

We then give a randomized algorithm for spectral approximation that is space-optimal, up to logarithmic factors.

Theorem 1.2 (Space Efficient Spectral Approximation).

Let be a stream of rows and be the size of the sliding window for some constant . Let be the matrix consisting of the most recent rows and suppose across all matrices formed by consecutive rows in the stream. Then given a parameter , there exists an algorithm that outputs a matrix with a subset of (rescaled) rows of such that

and uses

space, with probability

. The total running time of the algorithm is , where is the length of the stream and is the input sparsity of the stream. (See Theorem 4.5.)

We remark that this result has applications to covariance estimation, generalized linear regression, row subset selection, and Schatten -norm estimation.

We also give an algorithm for low rank approximation that is optimal in space up to polylogarithmic factors [CW09]. In fact, we give the stronger result of outputting a rank projection-cost preserving sample, which is defined as follows:

Definition 1.3 (Rank Projection-Cost Preserving Sample [Cem15]).

For (resp. ), a matrix of rescaled rows of (resp. a matrix of rescaled columns of ) is a projection-cost preserving sample if, for all rank orthogonal projection matrices (resp. ),

Note that a rank projection-cost preserving sample provides a solution for low-rank approximation of a matrix by choosing the appropriate rank orthogonal projection matrix that minimizes .

Theorem 1.4 (Space Efficient Rank Projection-Cost Preserving Sample).

Let be a stream of rows and be the size of the sliding window for some constant . Let be the matrix consisting of the most recent rows and suppose across all matrices formed by consecutive rows in the stream. Then given a parameter , there exists an algorithm that outputs a matrix that is a rank projection-cost preserving sample of , using space, with probability at least . The total running time of the algorithm is , where is the length of the stream and is the input sparsity of the stream. (See Theorem 5.8.)

We show that our techniques have broad applications. We show that our analysis for the sum of the reverse online -ridge leverage scores (Lemma 5.5) immediately gives an online algorithm for low-rank approximation that uses optimal space up to polylogarithmic factors. In this online setting, the rows of a matrix arrive sequentially and an algorithm must irrevocably choose at each time whether to store or discard the most recently arrived row. We emphasize that here is the length of the stream rather than the size of the sliding window222We use the convention that in Section 6.1 in convention with standard notation for online algorithms.

Theorem 1.5 (Online Rank Projection-Cost Preserving Sample).

There exists an online algorithm that, given a stream of rows as input, returns a matrix that is a rank projection-cost preserving sample of and uses space. (See Theorem 6.4.)

We then show that our deterministic spectral approximation algorithm can be used as a key ingredient for an algorithm that preserves in the sliding window model up to relative and additive errors, assuming certain conditions about the entries of and .

Theorem 1.6 ( Spectral Approximation).

Let be a stream of rows and suppose the entries of each row are integers that are bounded by some polynomial in . Let be the matrix consisting of the most recent rows. Given a parameter , there exists an algorithm that outputs a matrix such that for any vector with polynomially bounded entries,

with high probability. The algorithm uses space. (See Theorem 6.14.)

If the entries of the input matrix are not polynomially bounded, we can also obtain additive error (see Theorem 6.15).

We also use our downsampling techniques to give the first upper bound for covariance matrix approximation in the sliding window model. In this problem, the rows of a matrix arrive sequentially, and the goal is to output a matrix such that .

Theorem 1.7 (Covariance Matrix Approximation).

There exists an algorithm that performs covariance matrix approximation in the sliding window model, using bits of space. (See Theorem 6.20.)

We complement our covariance matrix approximation algorithm with a lower bound that is tight up to factors.

Theorem 1.8.

Any sliding window algorithm that performs covariance matrix approximation with probability at least requires bits of space. (See Theorem 6.24.)

Remark 1.9.

Our techniques for covariance matrix approximation can be easily adapted to the problem of approximate matrix multiplication of matrices and when the columns of arrive in sync with the rows of , and the goal is to approximate .

Our results can be summarized in Table 1.

Problem Space Reference
Deterministic Spectral Approximation Theorem 1.1
Spectral Approximation Theorem 4.5
Rank Approximation Theorem 5.8
Online Rank Approximation Theorem 6.4
Covariance Matrix Approximation Theorem 6.20, Theorem 6.24
Table 1: Space bounds for various numerical linear algebra problems, where lower order terms are omitted for ease of readability. Matrices have dimension , and is the approximation parameter. We also obtain space for Spectral Approximation in the sliding window model under a bit complexity assumption (see Theorem 6.14.)

1.2 Overview of Our Techniques

Smooth histograms and exponential histograms give a framework for sliding window algorithms for smooth functions (see Definition A.1). Informally, a function is smooth if, given substreams where is a suffix of and is a “good” approximation to , then will also be a “good” approximation to for all substreams that arrive afterwards. A first attempt would be to apply these frameworks to various linear-algebraic quantities by showing smoothness properties for these functions. Unfortunately, we show in Appendix B that many interesting linear-algebraic functions, including low-rank approximation, generalized regression, spectral norms, and vector-induced matrix norms, are not smooth. Therefore, we require a new approach altogether.

Deterministic Spectral Approximation.

Our first observation is that many problems in numerical linear algebra can be solved if we can maintain a spectral approximation of the covariance matrix in the sliding window model, i.e., compute such that . The covariance matrix is nice because the Loewner order interacts nicely with row updates in the following sense: If and are substreams, where is a suffix of , corresponding to matrices and and , then

for any matrix that represents a substream that arrives right after the substream . Although the similarity between the above consequence and the definition of smooth functions is notable, we cannot immediately apply existing data structures in the sliding window model because they consider real-valued functions333Positive semidefinite ordering is a partial order; therefore, does not imply that . On the other hand, implies that – this fact is used in a non-trivial manner in the existing results..

We now describe a deterministic algorithm to show that spectral approximation is possible in the sliding window model, as well as build a template for our more space-efficient algorithms. To gain additional intuition behind our algorithms, first consider the case when we do not have any space constraints. Then we can have the following naïve data structure that, at time , stores matrices, and all the time-stamps , such that

where is the covariance matrix formed from the updates to .

The space required by this naïve algorithm is , which is prohibitively large. The key to our algorithm is the observation that many matrices in the naïve data structure have their spectrum quite “close” to each other and can be approximated by a single matrix. That is, we show that we need to only store matrices and delete the rest of them.

The main point here is that, since we want a -spectral approximation of the matrix formed by the sliding window, we only need to store those matrices whose spectrum are -close to the one that we would delete. We first describe a purely linear-algebraic data structure that achieves this task in Section 2. We then augment this data structure with the necessary components in Section 2.1 and Section 3 to maintain a spectral approximation in the sliding window model. In particular, the data structure stores a subset of (let this subset be ) with the invariant that for all . We show that this invariant not only suffices to give a -spectral approximation, but that it also provides a good upper bound on the number of matrices that we store.

We can then break our algorithm into three main subroutines. Upon the arrival of each new row , the procedure Update adds to each . We then use a procedure Compress to maintain the above invariant. Finally, we use Expire to remove any matrix that is sufficiently older than the sliding window.

Space Efficient Spectral Approximation.

Observe that the previous deterministic algorithm maintains separate instances and also returns an approximation to , where is the matrix consisting of the most recent rows. Thus, we have two potential directions to improve upon the deterministic algorithm. First, we would like to produce a spectral approximation to the underlying matrix , rather than . Secondly, we would like to improve upon the dependence to something more comparable to the dependent lower bounds that also hold for the easier, insertion-only streaming model.

The challenge for space efficient spectral approximation in the sliding window model results from two conflicting forces. Suppose we have a good approximation to the matrix consisting of the rows that have already arrived in the stream. When a new row arrives, we would like to sample the row with high probability if is “important” based on the rows in . Namely, if has high magnitude or a different direction than the rows of , we would like to capture that information by sampling . On the other hand, if has low importance based on the existing rows of , then it seems like we should not sample .

However, the sliding window model also places great importance on the recent rows. Suppose the rows that follow all only contain zeroes and that all rows of , i.e., all rows before are expired at the time of query. If we did not sample , then we would be left with no information about the underlying matrix. Hence, it is obvious that we must always store the most recent row, and similarly place greater emphasis on more recent rows. Although the leverage score of a row is a good indicator of how “unique” (and hence, important) the row is with respect to the previous rows, there is no measure that combines both uniqueness and recency.

On a positive note, it is possible to get a good approximation to each maintained by the deterministic algorithm using a much smaller number of rows through any streaming algorithm for spectral approximation. We can then set rather than and use the same maintenance rules to ensure that our algorithm only stores a subset of the matrices . However, can be expressed using roughly space whereas storing can require space, we can store each explicitly and output at the end of the stream, as a good approximation to . This gives a randomized algorithm that outputs a spectral approximation to , rather than as by the deterministic algorithm, but actually uses space worse than the deterministic algorithm.

Nevertheless, this randomized algorithm crucially gives a template for a more space efficient spectral approximation. Observe that the rows of are a subset of the rows of for any , so might contain a lot of the rows in , which suggests a more optimal form of row sampling. As previously discussed, recent rows are more important in the sliding window model compared to the streaming model, regardless of the values of previous rows. Intuitively, if the rows are roughly uniform, then the “importance” of each row is inversely proportional to how recent the row has appeared in the stream. Since the rows will likely not be uniform, a natural approach is to adapt squared norm sampling, where each row is sampled with probability proportional to its squared norm, and inversely proportional to the sum of the squared norms of the rows that have appeared afterwards. However, squared norm sampling does not give relative error, which suggests using a sampling approach using leverage scores.

To that effect, we introduce the concept of reverse online leverage scores (see Definition 4.1). Given a matrix , we say the reverse online leverage score is the leverage score of with respect to the matrix . The reverse online leverage scores value both the recency of a row as well as the uniqueness. However, we cannot track the reverse online leverage scores of each row in the stream without storing the entire stream. Instead, we use the rows that we have sampled as a proxy to in computing approximations to the reverse online leverage scores. Specifically, we can use the rows of while iteratively filtering the rows of for .

In summary, our algorithm stores a number of (rescaled) rows of the stream, where . Each time a new row arrives in the data stream, our algorithm first performs downsampling to see whether should be retained, based on the approximate reverse online leverage scores. Our algorithm then iteratively, from to , performs downsampling to see whether each row should be retained, based on the rows of that have been retained. Each time a row is retained, the appropriate rescaling of the row is performed. Thus the algorithm can reconstruct each by considering the matrix of all rows that have been sampled since time .

To show correctness, we prove an invariant, Lemma 4.2 and subsequent corollaries, that says that each row is oversampled, i.e., sampled with probability sufficiently larger than the reverse online leverage score. Hence, the rows that have been sampled since time form a good spectral approximation to the matrix that contains all the rows that have arrived since time in the data stream. This invariant holds for all times and in particular, it holds for time and thus the sliding window.

To bound the total number of rows our algorithm samples, we upper bound the probability that each row is sampled and show it is at most a factor multiple of the reverse online leverage score. The sum of the reverse online scores turns out to equal the sum of the online leverage scores. Since the sum of the online leverage scores is roughly , it follows that our algorithm will sample rows with high probability.

Because our space efficient algorithm employs row sampling, it further enjoys the property that the total runtime is dependent on the input sparsity, up to polylogarithmic factors. By batching until a certain number of rows have arrived before we run an instance of the downsampling procedure, we can amortize the runtime needed to compute the reverse online leverage scores for each row. Again it suffices to approximate each reverse online leverage score within some constant factor, since we can just oversample with respect to the approximation guarantee. We can then use standard projection techniques to embed each of the sampled rows into a dimensional subspace by applying a Johnson-Lindenstrauss transform, and computing the leverage scores of the rows with respect to the matrix of the transformed rows. For more details, see Section 4.

Space Efficient Projection Cost Preserving Sample.

Our space efficient spectral approximation algorithm outputs a matrix whose singular values are within of , where is the matrix consisting of the most recent rows. However, certain tasks such as rank approximation can be achieved by sampling roughly rows with the appropriate regularization [CMM17]. Thus it seems reasonable that these tasks can be performed in the sliding window model also by sampling a number of rows linear in rather than .

Suppose we know the value of in advance, where we use to denote the best rank approximation . Then we can use the same downsampling procedure as before with reverse online ridge leverage scores and regularization parameter . Using the same invariant as before, the procedure oversamples each row relative to the reverse online ridge leverage score. This suffices to show that the rows that have been sampled since time form a projection cost preserving sample for the matrix that contains all the rows that have arrived since time in the data stream. That is, the algorithm maintains a good low rank approximation for the matrix in the window.

The total number of rows our algorithm samples is again bounded by some multiple of the sum of the reverse online ridge leverage scores, which equals the sum of the online ridge leverage scores. [CMP16] previously showed that the sum is bounded by . However, this does not suffice for our purpose. Thus, we provide a tighter analysis when , showing that the sum of the online ridge leverage scores is then . Hence under additional assumptions about the underlying singular values and suppressing and factors, the total space used by our algorithm is , when is known in advance and in fact, any constant factor underestimation of suffices to achieve an algorithm with the same asymptotic space.

We now describe a procedure Estimate to obtain this constant factor underestimation, which we surprisingly achieve by using another projection-cost preserving sample. Cohen et al. [CEM15] show that is a projection-cost preserving sketch for when is a sufficiently large dense Johnson-Lindenstrauss matrix. So if we obtain , we can approximate . Note that we cannot use directly to obtain a projection-cost preserving sample, since does not preserve the rows of . Moreover, while has columns, still has rows, so we can only afford to sketch a spectral approximation for . Fortunately, our space efficient spectral approximation Meta-Spectral does exactly that when we input the rows . Hence, we can use the spectral approximation of to repeatedly update the constant factor underestimation of . Similar to our spectral approximation algorithm, our low-rank approximation algorithm has total runtime dependent on the input sparsity, up to lower order factors. We describe the algorithm in full in Section 5.

Applications of our Techniques.

We show that our techniques have applications to linear-algebraic functions in a number of interesting other models. We start by using our analysis for the sum of the reverse online ridge leverage scores to give a nearly space optimal algorithm for low-rank approximation in the online setting. In this problem, the rows of an underlying matrix arrive sequentially in a stream and an algorithm must irrevocably either store or discard the row, to output a good rank approximation of . If we set , where is the best rank approximation of , then it is known that by sampling each row with probability at least the -ridge leverage score, we can obtain a good rank approximation of  [CMM17]. Moreover, the online -ridge leverage score of a row is at least the -ridge leverage score, so it suffices to sample each row with probability at least the online -ridge leverage score. Thus, if we knew the value of , then our tighter bounds for the sum of the online -ridge leverage scores would imply a nearly space optimal algorithm for the online low-rank approximation problem. Hence, we simultaneously run an algorithm to estimate the value of , similar to the Estimate procedure described previously.

We then show that our deterministic spectral approximation algorithm can be applied as a crucial subroutine for an algorithm to preserve in the sliding window model up to relative and additive errors, assuming the entries of and are integers bounded in absolute value by . We would like to form an spectral approximation in the sliding window model. However, it does not seem evident how to bound “online” versions of the Lewis weights [CP15] or the leverage scores [DDH08]. While our analysis to bound the sum of the online ridge leverage scores relates the change in volume of the parallelepiped spanned by the columns of the stream to the online ridge leverage score of the new row, a similar geometric argument using (online) Lewis weights or (online) leverage scores does not seem obvious. A primary reason for this is that, unlike for , the unit ball is not an ellipsoid, and, it is not clear how the John ellipsoid or the Lewis weights of this polytope change when a new row is added.

Instead, we observe that if , the entries of and are bounded by some polynomials in , then each time there is a relative increase from to for , there must also be a significant increase from to . Therefore, we can use our deterministic spectral approximation algorithm to track the times in which this relative increase might happen. At each of these times, we run a separate streaming algorithm for spectral approximation and we output the algorithm corresponding to the sliding window at the end of the stream. This approach is highly reminiscent of the smooth histogram data structure and further illustrates why our deterministic spectral approximation algorithm may be viewed with independent interest as a framework with possible applications to other linear-algebraic functions in the sliding window model.

Finally, we show that our downsampling techniques can be used to perform covariance matrix approximation in the sliding window model. It is known that squared row norm sampling suffices for approximate matrix multiplication in the offline setting. In the sliding window model, each time a new row arrives, the Frobenius norm of the underlying matrix changes; therefore, all previous rows must be downsampled accordingly. Although we do not know the Frobenius norm of the underlying matrix, we certainly have the squared row norm when the row arrives. This can be used to obtain an estimate of the Frobenius norm using smooth histograms. Hence, we can emulate squared row norm sampling by tracking the probability that each row stored by our algorithm has been sampled with, and downsampling accordingly.

Organization of the rest of the paper.

As a warm-up, we describe in Section 2 a data structure that serves as a generalization of the smooth histogram framework from real-valued functions to matrix-valued functions. We then describe in Section 3 how this data structure can be used to obtain our deterministic algorithm for spectral approximation in the sliding window model. We improve upon both the time and space complexities by giving a sampling based algorithm for spectral approximation in Section 4. We show in Section 5 that we can further optimize the space complexity if we are interested in low-rank approximation, while also giving a tighter analysis on the sum of the online ridge leverage scores. In Section 6, we give applications of our techniques, first showing that this tighter analysis implies a nearly space optimal algorithm for online low-rank approximation. We then discuss spectral approximation in Section 6.2 and covariance matrix approximation in Section 6.3.

We give a brief overview of the background of the sliding window model in Appendix A. We show in Appendix B that various linear-algebraic functions cannot be used in the smooth histogram framework.

2 A General Data Structure

In this section, we describe a general data structure with the relevant numerical linear algebra properties that we shall eventually use in the sliding window model.

Definition 2.1.

Given a parameter , we define a spectrogram to be a data structure that consists of a sequence of PSD matrices such that

(Invariant )

Further, supports the following operations:

  • : For a vector , operation adds to each matrix , appends the matrix to .

  • : Operation enforces the invariant for each matrix in the sequence, by deleting a number of matrices in and reordering the indices.

  • : Operation erases and reorders the indices.

We define the operations Update, Compress, and Expire formally as follows.

0:  A data structure that consists of a sequence of PSD matrices , and a row vector .
0:  An updated data structure where is added to each matrix and also appended at the end.
1:  Set .
2:  for  do
3:     .
4:  end for
5:  .
6:  return.
Algorithm 1
0:  A data structure that consists of a sequence of PSD matrices and approximation parameter .
0:  An updated data structure that enforces Invariant .
1:  for  do
2:     Let . This implies .
3:     if  then
4:        Delete .
5:        for  to  do
6:           .Reorder indices.
7:        end for
8:        .
9:     end if
10:  end for
11:  return.
Algorithm 2
0:  Data structure consisting of PSD matrices .
0:  An updated data structure .
1:  Delete .
2:  for  to  do
3:     .Reorder indices.
4:  end for
5:  .
6:  return.
Algorithm 3
Observation 2.2.

The procedure increases each matrix in by and then deletes a number of matrices so that for each matrix that remains in .

Observation 2.3.

The procedure enforces for each matrix that remains in , enforcing Invariant .

Observation 2.4.

The procedure deletes and reorders the indices.

We now show that the number of matrices in a spectrogram can be bounded, given certain restrictions on the eigenvalues of the matrices in the data structure.

Lemma 2.5.

Suppose is a spectrogram that contains matrices satisfying Invariant . If there exist parameters such that for each and , either or , then .

Proof.

First, observe that there cannot exist matrices and with such that for all , by Invariant . In other words, each pair of matrices must have some eigenvalues that differ by a factor of at least . By assumption, the eigenvalues of each satisfy , or . Thus for a fixed for which , there can be at most indices such that . Each matrix has dimension , and so there are eigenvalues, which implies there can be at most indices in . ∎

2.1 Data Structure Augmented with Timestamps

We now define a data structure by augmenting each of the matrices in a spectrogram data structure with a corresponding timestamp .

Definition 2.6.

We define a data structure to be a spectral histogram on a data stream of rows and a sliding window parameter if it contains a sequence of PSD matrices , along with corresponding timestamps that satisfy the following properties:

  1. Property 1: For each , .

  2. Property 2: For each , one of the following holds:

    1. Property 2a: and .

    2. Property 2b: and if , then .

  3. Property 3: .

Observation 2.7.

Property 2 is equivalent to Invariant .

Property 1 and Property 2 together should be considered a timestamp augmented analogue of Invariant , while Property 3 should be considered a timestamp augmented analogue of procedure Expire.

Thus we define AugUpdate, AugCompress, and AugExpire to be time-sensitive analogues of Update, Compress, and Expire.

0:  Data structure (spectral histogram) consisting of PSD matrices and corresponding timestamps , row vector , and current timestamp .
0:  Updated data structure .
1:  Set and .
2:  for  do
3:     .
4:  end for
5:  .
6:  return.
Algorithm 4
0:  Data structure (spectral histogram) consisting of PSD matrices with corresponding timestamps , and approximation parameter .
0:  Updated data structure .
1:  for  do
2:     Compute . This implies .
3:     if  then
4:        Delete and .
5:        for  to  do
6:           Reorder indices.
7:           
8:        end for
9:        .
10:     end if
11:  end for
12:  return.
Algorithm 5
0:  Data structure (spectral histogram) consisting of PSD matrices and timestamps , current time , and window size .
0:  Updated data structure .
1:  if  then If first two timestamps are both expired.
2:     Delete and .
3:     for  to  do
4:        Reorder indices.
5:        
6:     end for
7:     .
8:  end if
9:  return.
Algorithm 6

3 Spectral Approximation in the Sliding Window Model

We now show how a spectral histogram can be used to maintain a spectral approximation in the sliding window model.

0:  A stream of rows , a parameter for the size of the sliding window, and an accuracy parameter .
0:  A -spectral approximation in the sliding window model.
1:  .
2:  .
3:  for each row  do
4:     
5:     
6:     
7:  end for
8:  return.
Algorithm 7 Deterministic Algorithm for Spectral Approximation
Lemma 3.1.

Algorithm 7 maintains a spectral histogram, i.e., Definition 2.6.

Upon receiving each row in the stream, Algorithm 7 runs three subroutines: AugUpdate, AugCompress, and AugExpire. We show that these subroutines enforce the three important properties in Definition 2.6. The proof of Lemma 3.1 follows immediately from Lemma 3.2, Lemma 3.3, and Lemma 3.7.

Intuitively, the AugUpdate subroutine updates each of the matrices to maintain spectral approximations, incorporating the newly arrived row, so that Property 1 holds (see Lemma 3.2). The AugCompress subroutine ensures that the number of matrices remains small, by removing any three consecutive matrices whose spectra are within a factor of , and maintains Property 2 (see Lemma 3.3). The AugExpire subroutine also ensures that the number of matrices remains small, by removing any matrix that is “too old” for the sliding window so that Property 3 holds (see Lemma 3.7).

We first show that Property 1 is satisfied after running the AugUpdate subroutine.

Lemma 3.2.

Upon receiving row , the data structure maintained by Algorithm 7 after the AugUpdate subroutine satisfies Property 1.

Proof.

By the AugUpdate subroutine, , enforcing Property 1. Since the AugCompress and AugExpire subroutines only delete and reorder indices, the invariant is maintained by the algorithm. ∎

We next show that Property 2 is satisfied after running the AugCompress subroutine. The following lemma says that the spectra of two adjacent matrices are either “far” (Property 2a) or “close” (Property 2b), but in the latter case, there cannot be three adjacent matrices whose spectra are within a factor.

Lemma 3.3.

Upon receiving row , the data structure maintained by Algorithm 7 after the AugCompress subroutine consists of matrices that satisfy Property 2. That is, for each , one of the following holds:

  1. Property 2a: and .

  2. Property 2b and if , then .

Proof.

Let be the timestamps maintained by the data structure after receiving row and be the timestamps maintained by the data structure after receiving row . We fix an index and observe that if , then cannot have been deleted in a previous step. Thus, , so that for some index . Let be the corresponding matrices after receiving row . We say that is the successor of in and is the successor of (and thus the successor of ) in .

We first consider the case when , i.e., whether the successor of in was deleted subsequently.

Claim 3.4.

If , then Property 2b holds.

Proof.

If , then the AugCompress subroutine must have deleted timestamp but not timestamp (since ). By the deletion condition of the AugCompress subroutine, it follows that and also if , then . Thus, Property 2b holds and the claim follows. ∎

We now perform a case analysis on whether , i.e., whether the successor of in was also the successor of in . For ease of analysis, we break the case into two parts, depending on whether or not.

Claim 3.5.

If and , then Property 2b holds.

Proof.

If and , i.e., the successor of is not , then consider the first time at which point became the successor of . That is, is the first timestamp where all timestamps between and have been removed by the AugCompress subroutine.

By the deletion condition of the AugCompress subroutine, it follows that , where (and similarly for ). Therefore,

We remark that this property is somewhat analogous to the smoothness property required for smooth histograms (see Definition A.1), but generalized to PSD matrices.

In other words, , since and . Moreover, if , then holds by the deletion condition of the AugCompress subroutine, the observation that has a PSD ordering, and the fact that , i.e., the index was not deleted. Thus, Property 2b holds and the claim follows. ∎

Claim 3.6.

If and , then either Property 2a holds or Property 2b holds.

Proof.

Finally, if and , i.e., the successor of is , then , so the successor of is also . Then either so Property 2a holds or . In the second case, Property 2b holds, since if , then holds by the deletion condition of the AugCompress subroutine, the observation that has a PSD ordering, and the fact that , i.e., the index was not deleted, and the claim follows. ∎

The proof of Lemma 3.3 then follows from Claim 3.4, Claim 3.5, and Claim 3.6. ∎

Finally, we show that Property 3 is satisfied after running the AugExpire subroutine.

Lemma 3.7.

Upon receiving row , the data structure maintained by Algorithm 7 after the AugExpire subroutine consists of timestamps that satisfy .

Proof.

By the AugExpire subroutine, is deleted if . The inequality follows since . Since the AugUpdate and AugCompress subroutines cannot delete and , then the invariant is maintained by the algorithm. ∎

We now show that Algorithm 7 maintains a spectral approximation in the sliding window model.

Theorem 3.8.

For any time , let . Moreover, suppose that any nonzero eigenvalue of any matrix maintained by Algorithm 7 satisfies . Then Algorithm 7 outputs a spectral approximation to , using space.

Proof.

Let consist of matrices and corresponding timestamps . By Property 3 in Lemma 3.1, so that . If , then and so by Property 1. Otherwise, if , then by Property 2,