1. Introduction
Given multiple time series data (e.g., measurements from multiple sensors) and a time range (e.g., 1:00 am  3:00 am yesterday), how can we efficiently discover latent factors of the time series in the range? Revealing hidden factors in time series is important for analysis of patterns and tendencies encoded in the time series data. Singular value decomposition (SVD) effectively finds hidden factors in data, and has been extensively utilized in many data mining applications such as dimensionality reduction (Ravi Kanth et al., 1998), principal component analysis (PCA) (Jolliffe, 2002; Wall et al., 2003), data clustering (Simek et al., 2004; Osiński et al., 2004)
, tensor analysis
(Sael et al., 2015; Jeon et al., 2015; Jeon et al., 2016b; Jeon et al., 2016a; Park et al., 2016; Oh et al., 2018), graph mining (Kang et al., 2012; Tong et al., 2006; Kang et al., 2011, 2014) and recommender systems (Koren et al., 2009; Park et al., 2017). SVD has been also successfully applied to stream mining tasks (Wall et al., 2003; Spiegel et al., 2011) in order to analyze time series data.However, methods based on standard SVD (Brand, 2003; Ross et al., 2008; Zadeh et al., 2016; Halko et al., 2011) are not suitable for finding latent factors in an arbitrary time range since the methods have an expensive computational cost, and they have to store all the raw data. This limitation makes it difficult to investigate patterns of a time range in stream environment even if it is important to analyze a specific past event or find recurring patterns in time series (Papadimitriou and Yu, 2006). A naive approach for a time range query on time series is to store all of the arrived data and apply SVD to the data, but this approach is inefficient since it requires huge storage space, and the computational cost of SVD for a long time range query is expensive.
In this paper, we propose ZoomSVD (Zoomable SVD), an efficient method for revealing hidden factors of multiple time series in an arbitrary time range. With ZoomSVD, users can zoomin to find patterns in a specific time range of interest, or zoomout to extract patterns in a wider time range. ZoomSVD comprises two phases: storage phase and query phase. ZoomSVD considers multiple time series as a set of blocks of a fixed length. In the storage phase, ZoomSVD carefully compresses each block using SVD and lowrank approximation to reduce storage cost and incrementally updates the most recent block of a newly arrived data. In the query phase, ZoomSVD efficiently computes the SVD results in a given time range based on the compressed blocks. Through extensive experiments with realworld multiple time series data, we demonstrate the effectiveness and the efficiency of ZoomSVD compared to other methods as shown in Figure 1. The main contributions of this paper are summarized as follows:

Algorithm. We propose ZoomSVD, an efficient method for extracting key patterns from multiple time series data in an arbitrary time range.

Analysis. We theoretically analyze the time and the space complexities of our proposed method ZoomSVD.

Experiment. We present experimental results showing that ZoomSVD computes time range queries up to faster, and requires up to less space than other methods. We also confirm that our proposed method ZoomSVD provides the best tradeoff between efficiency and accuracy.
The codes and datasets for this paper are available at http://datalab.snu.ac.kr/zoomsvd. In the rest of this paper, we describe the preliminaries and formally define the problem in Section 2, propose our method ZoomSVD in Section 3, present experimental results in Section 4, demonstrate the case study in Section 5, discuss related works in Section 6, and conclude in Section 7.
Symbol  Description 

Initial block size  
Threshold for lowrank approximation  
Number of singular values  
Number of singular values in th block  
Vertical concatenation of two matrices and  
Raw multiple time series data  
ith block of  
Left singular vector matrix of 

Singular value matrix of  
Right singular vector matrix of  
Left singular vector matrix computed in query phase  
Singular value matrix computed in query phase  
Right singular vector matrix computed in query phase  
Set of left singular vector matrix  
Set of singular value matrix  
Set of right singular vector matrix  
Time range query  
Starting point of time range query  
Ending point of time range query  
Index of block matrix corresponding to  
Index of block matrix corresponding to 
2. Preliminaries
We describe preliminaries on singular value decomposition (SVD) and incremental SVD (Sections 2.1 and 2.2). We then define the problem handled in this paper (Section 2.3). Table 1 lists the symbols used in this paper.
2.1. Singular Value Decomposition (SVD)
SVD is a decomposition method for finding latent factors in a matrix . Suppose the rank of the matrix is . Then, SVD of is represented as where is an diagonal matrix whose diagonal entries are singular values. The th singular value is located in where . is called the left singular vector matrix (or a set of left singular vectors) of ;
is a column orthogonal matrix where
, ,are the eigenvectors of
. is the right singular vector matrix of ; is a column orthogonal matrix where , , are the eigenvectors of . Note that the singular vectors in and are used as hidden factors to analyze the data matrix .Lowrank approximation. Lowrank approximation effectively approximates the original data matrix based on SVD. The key idea of the lowrank approximation is to keep top highest singular values and corresponding singular vectors where is a number smaller than the rank of the original matrix. The lowrank approximation of is represented as follows:
where the reconstruction data is the lowrank approximation of , , , and . The error of the lowrank approximation is represented as follows:
where is the Frobenius norm of a matrix, and is the rank of the original matrix. The parameter for lowrank approximation is determined by the following equation:
(1) 
where is a threshold between and .
2.2. Incremental SVD
Incremental SVD dynamically calculates the SVD result of a matrix with newly arrived data rows. Suppose that we have the SVD result , , and of a data matrix at time . When an matrix arrives at time , the purpose of incremental SVD is to efficiently obtain the SVD result of based on the previous result , , and . Note that denotes the vertical concatenation of two matrices and . Incremental SVD is used to analyze patterns in time series data (Sarwar et al., 2002), and several efficient methods for incremental SVD were proposed (Brand, 2003; Ross et al., 2008). This incremental SVD technique is exploited in our method to incrementally compress and store the data (see Algorithm 1 in Section 3.2).
2.3. Problem Definition
We formally define the time range query problem as follows:
Problem 1 ().
(Time Range Query on Multiple Time Series)

Given: a time range , and multiple time series data represented by a matrix where is the length of the time dimension, and is the number of the time series,

Find: the SVD result of the submatrix of in the time range quickly, without storing all of . The SVD result includes , , and where is the rank of the submatrix.
Applying the standard SVD or incremental SVD for the time range query is impractical for the following reasons. Standard SVD needs to extract the submatrix corresponding to the time range before performing decomposition. Iwen et al. (Iwen and Ong, 2016) proposed a hierarchical and distributed approach for computing and except for of a whole matrix . Zadeh et al. (Zadeh et al., 2016) introduce Tall and Skinny SVD which obtains and by computing eigendecomposition of , and then computes using , , and . Halko et al. (Halko et al., 2011) propose Randomized SVD which computes SVD of using randomized approximation techniques. However, such methods are inefficient because they need to compute SVDs from scratch for multiple overlapping queries. Furthermore, those methods need to keep the entire time series data , which is practically infeasible in many streaming applications. Incremental SVD considers updates only on newly added data, and thus cannot perform SVD on a specific time range.
To address these limitations, we propose an efficient method for the time range query in Section 3.
3. Proposed Method
We propose ZoomSVD, a fast and spaceefficient method for extracting key patterns from multiple time series data in an arbitrary time range. We first give an overview of ZoomSVD in Section 3.1. We describe details of ZoomSVD in Sections 3.2 and 3.3. Finally, we analyze ZoomSVD’s time and space complexities in Section 3.4.
3.1. Overview
ZoomSVD efficiently extracts key patterns from multiple time series data in an arbitrary time range using SVD. The main challenges for the time range query problem (Problem 1) are as follows:

Minimize the space cost. The amount of multiple time series data increases over time. How can we reduce the space while supporting time range queries?

Minimize the time cost. How can we quickly compute SVD of multiple time series data in an arbitrary time range?
We address the above challenges with the following ideas:

Compress multiple time series data (Section 3.2). ZoomSVD compresses the raw data using incremental SVD, and discards the raw data in the storage phase.

Optimize the computational time of StitchedSVD (Section 3.3.2). We optimize the performance of StitchedSVD by reducing numerical computations using a block matrix structure.
ZoomSVD comprises two phases: storage phase and query phase. In the storage phase (Algorithm 1), ZoomSVD stores the SVD results corresponding to length blocks in the time series data in order to support time range queries as shown in Figure 2. When a new data arrives, ZoomSVD incrementally updates the SVD result with the newly arrived data, block by block. In the query phase (Algorithms 2), ZoomSVD returns the SVD result for a given time range . The query phase utilizes our proposed PartialSVD and StitchedSVD modules to process the time range query. Partial SVD (Algorithm 3) manipulates the SVD result containing (or ) to match the query time range as shown in Figure 2. StitchedSVD (Algorithms 2) efficiently computes the SVD result between and by stitching the SVD results for blocks in the time range.
3.2. Storage Phase of ZoomSVD
Given multiple time series stream , the objective of the storage phase is to incrementally compress the input data and discard the original input data to achieve space efficiency. A naive incremental SVD would update one large SVD result when the data are newly added. However, this approach is impractical because the processing cost for the newly added data increases over time. Also, the naive incremental SVD does not support a time range query quickly in the query phase because it manipulates the large SVD result stored for the total time regardless of the query time range.
The storage phase of ZoomSVD (Algorithm 1) is designed to efficiently process newly added data and quickly support time range queries. Given multiple time series data , the storage phase of ZoomSVD (Algorithm 1) incrementally compresses the input data block by block using incremental SVD, and discards the original input data to reduce space cost. Assume the multiple time series data are represented by a matrix where is time length, and is the number of time series (e.g., sensors). We conceptually divide the matrix into length blocks represented by as shown in Figure 2. We then store the lowrank approximation result of each block matrix , where we exploit an incremental SVD method in the process. We formally define the block matrix in Definition 1.
Definition 0 (Block matrix ).
Suppose a multivariate time series is where is the th row vector of , and denotes the vertical concatenation of vectors. The th block matrix is then represented as follows:
where is a block size. In addition, denotes the th block matrix at time where indicates the index of the most recent block as shown in Figure 2. Note that the number of rows in is less than or equal to .
The computed SVD result , , and of each block matrix are stored as follows.
Definition 0 (Sets of SVD results , , and ).
The sets , , and store the SVD results , , and for all , respectively.
Note that the original time series data are discarded, and we store only the SVD results which occupy less space than the original data. The SVD results for block matrices are used in the query phase (Algorithm 2). Now we are ready to describe the details of the storage phase.
The storage phase (Algorithm 1) compresses the multiple time series data block by block using incremental SVD to support time range queries. When new multiple time series data are given at time (line 2), we generate the new SVD result of for the next block matrix if the SVD result are stored in , , and at time (lines 3 and 4). If not, we have the SVD result , , and of the most recent block matrix which is the th block matrix at time . Assume that we have the SVD result , , and of (i.e., the block matrix from time to as seen in Figure 2). We then update the SVD result into , , and for the new data using an incremental SVD method (line 6). If the number of rows of is , we put the SVD result , , and into , , and , respectively (lines 811). Equations (2) and (3) represent the details of how to update the SVD result of for the new incoming data , when contains rows. is represented by and the SVD result of in Equation (2):
(2) 
where is an zero matrix, and is an identity matrix. We then perform SVD to decompose into :
(3) 
where , , and . Note that is a column orthogonal matrix since it is the product of two orthogonal matrices. is also column orthogonal, and is a diagonal matrix whose diagonal entries are sorted in the descending order. Hence, , , and are considered as the SVD result of by the definition of SVD (Trefethen and Bau III, 1997). The time index can be omitted as in , , , and , as described in Definitions 1 and 2, if the number of rows of is .
3.3. Query Phase of ZoomSVD
Given the starting point and the ending point of a time range query, the goal of the query phase of ZoomSVD is to obtain the SVD result from to . A naive approach would reconstruct the time series data from the SVD results of the block matrices ranged between and , and perform SVD on the reconstructed data in the range. However, this approach requires heavy computations especially for a long time range query, and thus is not appropriate for serving time range queries quickly.
We propose two submodules, PartialSVD and StitchedSVD, which are used in the query phase of our proposed method (Algorithm 2) to efficiently process time range queries by avoiding reconstruction of the raw data. Let be the index of the block matrix including , and be the index of the block matrix including . PartialSVD (Algorithm 3) adjusts the time range of the SVD results for and as seen in the redcolored boxes of Figure 3 (line 1 of Algorithm 2). StitchedSVD combines the SVD results of PartialSVD and those of block matrices from to (lines 2 to 5 in Algorithm 2). We describe the details of PartialSVD and StitchedSVD in Sections 3.3.1 and 3.3.2, respectively.
3.3.1. PartialSVD
This module manipulates the SVD results of block matrices and to return the SVD results in a given time range . As seen in Figure 2, may contain the time range before , and may include the time range after . Note that those time ranges are out of the time range of the given query; thus, our goal for this module is to extract SVD results from and according to the time range query without reconstructing raw data. Figure 3 depicts the operation of PartialSVD. For the block matrix and its SVD , PartialSVD first eliminates rows of left singular vector matrix which are out of the query time range. After that, PartialSVD multiplies the remaining left singular vector matrix with the singular value matrix , and performs SVD of the resulting matrix. The resulting singular vector matrix and the singular value matrix constitute the output of PartialSVD. The remaining right singular vector matrix output of PartialSVD is computed by multiplying the right singular vector matrix with . Similar operations are performed for the block matrix and its SVD .
Now, we describe the details of this module (Algorithm 3). We first introduce elimination matrices which are used in PartialSVD to adjust the time range.
Definition 0 (Elimination matrices).
Suppose is the number of rows to be eliminated in according to . Then is the number of remaining rows in . Similarly, let be the number of rows to be eliminated in according to ; then is the number of remaining rows in . The elimination matrices and for and are defined as follows:
(4) 
The matrices and are multiplied to the elimination matrices, and the time ranges of the resulting matrices and are within the query time range . PartialSVD constructs those elimination matrices based on and (line 3 of Algorithm 3). The filtered block matrix is given by
(5) 
where was computed at the storage phase.
PartialSVD decomposes into via SVD and lowrank approximation with threshold since is not a column orthogonal matrix, and is not a form of the SVD result; then, Equation (5) is written as follows:
(6) 
where , , and . In line 4, PartialSVD performs SVD on , and in line 5 it computes . PartialSVD similarly computes the SVD result of in lines 67 of Algorithm 3.
3.3.2. StitchedSVD
This module combines the PartialSVD of and , and the stored SVD results of blocks matrices in the query time range to return the final SVD result corresponding to the query range as shown in Figure 2. A naive approach is to reconstruct the data blocks using the stored SVD results and perform SVD on the reconstructed data of the given query time range. However, this approach cannot provide fast query speed for a long time range due to heavy computations induced by the reconstruction and the following SVD. The goal of StitchedSVD is to efficiently stitch the SVD results in the query time range by avoiding reconstruction and minimizing the numerical computation of matrix multiplication.
Specifically, StitchedSVD stitches several consecutive block SVD results together to compute the SVD corresponding to the query time range: is.e., it combines the SVD result , , and of the th block matrix , for , to compute the SVD , , and . The main idea is 1) to carefully decouple the matrices from , 2) construct a stacked matrix containing for , 3) perform SVD on the stacked matrix to get the singular value matrix and the right singular vector matrix of the final SVD result, and 4) carefully combine with the left singular matrix of SVD of the stacked matrix to get the left singular vector matrix of the final SVD result.
Lines 2 to 5 of Algorithm 2 present how stitched SVD matrices are computed. First, we construct based on the block matrix structure where is equal to . After organizing the block matrix structure of , we define block diagonal matrix as follows.
Definition 0 (Block diagonal matrix).
Suppose and are the left singular vector matrices produced by PartialSVD. Let , , , be the left singular vector matrices in . The block diagonal matrix is defined as follows:
Then, the matrix corresponding to the time range query is represented as follows:
(7) 
where and are elimination matrices of PartialSVD, and is equal to . As we apply SVD and lowrank approximation to , Equation (7) becomes as follows:
(8) 
where is computed by lowrank approximation and SVD, , , and . To avoid matrix multiplication between and zero submatrices of , we split block by block as follows:
where and correspond to and , respectively, and correspond to for . Then of Equation (8) is computed as follows:
(9) 
The column orthogonality of is established as it is the product of two column orthogonal matrices; also, is column orthogonal. Note that we perform PartialSVD to satisfy column orthogonal condition before performing StitchedSVD.
3.4. Theoretical Analysis
We theoretically analyze our proposed method ZoomSVD in terms of time and memory cost. Note that a collection of multiple time series data is a dense matrix, and the time complexity to compute SVD of is .
Time Complexity. We analyze the time complexities of the storage and the query phases in Theorems 5 and 6, respectively.
Theorem 5 ().
When a vector is given at time , the computation cost of storage phase in ZoomSVD is , where is the number of singular values.
Proof.
Performing SVD of takes , and multiplication of and takes since the row length of is always smaller than or equal to the block size . Assume and are equal to . The total computational cost of storage phase in ZoomSVD is O(+ ). We simply express the computational cost of storing the incoming data at each time tick as since the number of columns is generally greater than . ∎
In Theorem 5, the computation of storing the incoming data at each time tick takes constant time since and are constants and is smaller than .
Theorem 6 ().
Given a time range query , the time cost of query phase (Algorithm 2) is .
Proof.
It takes to compute PartialSVD where and are the number of singular values computed by PartialSVD (line 1 in Algorithm 2).
The computational time to perform SVD of depends on , in StitchedSVD since horizontal and vertical length of the matrix are and , respectively (lines 2 4 in Algorithm 2).
Also, the computational time of block matrix multiplication for (line 5 in Algorithm 2) takes O(
()) where is the number of singular values with respect to the SVD result of a given time range .
Let all ’s be in query phase, be larger than ; also, replace with block size since is always greater than .
Then, the computational time of PartialSVD and StitchedSVD takes and , respectively.
We can simply express the computational cost of ZoomSVD as .
∎
Theorem 6 implies that the computational time of ZoomSVD in query phase linearly depends on the time range