Construction of wavelet dictionaries for ECG modelling

09/21/2019 ∙ by Dana Cerna, et al. ∙ 0

The purpose of sparse modelling of ECG signals is to represent an ECG record, given by sample points, as a linear combination of as few elementary components as possible. This can be achieved by creating a redundant set, called a dictionary, from where the elementary components are selected. The success in sparsely representing an ECG record depends on the nature of the dictionary being considered. In this paper we focus on the construction of different families of wavelet dictionaries, which are appropriate for the purpose of reducing dimensionality of ECG signals through sparse representation modelling. The suitability of wavelet dictionaries for ECG modelling, applying the Optimized Orthogonal Matching Pursuit approach for the selection process, was demonstrated in a previous work on the MIT-BIH Arrhythmia database consisting of 48 records each of which of 30 min length. This paper complements the previous one by presenting the technical details, methods, algorithms, and MATLAB software facilitating the construction of different families of wavelet dictionaries. The implementation allows for straightforward further extensions to include additional wavelet families. The sparsity in the representation of an ECG record significantly improves in relation to the sparsity produced by the corresponding wavelet basis. This result holds true for the 17 wavelet families considered here.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The electrocardiogram (ECG) is a routine test for clinical medicine. It plays a crucial role in the diagnosis of a broad range of anomalies in the human heart; from arrythmias to myocardial infarction.

The widely available digital ECG data has facilitated the development of algorithms for ECG processing and interpretation. In particular, the literature for computerized arrhythmia detection and classification is extensive. Useful review matterial LSC16 ; LMM18 can help with the introduction to state of the art techniques, which nonetheless keeps growing AFA17 ; HRH19 ; ANP18 ; LYM17 ; Pla18 .

A common first step in ECG modeling consists in reducing the dimensionality of the signal. This entails to represent the informational content of the record by means of significantly fewer parameters than the number of samples in the digital ECG. When the aim is to reproduce the original signal at low level distortion, the step is frequently realized through transformations such as the Wavelet Transform and the Discrete Cosine Transform. In the last few years alternative approaches, falling within the category of sparse representation of ECG signals, have been considered.

Within the sparse representation framework, an ECG record is represented as a linear combination of elementary components, called atoms, which are selected from a redundant set, called a dictionary. The success of the methods developed within this framework depends on both, the selection technique and the proposed dictionary. The selection techniques which are widely applied for sparse representation of general signals are either greedy pursuit strategies MZ92 ; PRK93 ; RNL02 , or strategies based on minimization of the 1-norm as a cost function CDS98 . Suitable dictionaries depend on the class of signals being processed. These can be designed at hoc or be learned from training data. Sparse representation of ECG signals has been tackled by both these approaches, e.g. AGL15 learns dictionaries using some part of the records for ECG compression and RR18 uses Gabor dictionaries for structuring features for classification.

In a recent publication RNC19 we have shown that wavelet dictionaries, derived from known wavelet families, are suitable for representing an ECG record as a linear combination of fewer elementary components than those required by a wavelet basis. The model was shown to be successful for dimensionality reduction and lossy compression. As far as compression is concerned the method advanced in RNC19 produces compression results improving upon previously reported benchmarks LKL11 ; MZD15 ; PSS16 ; TZW18 for the MIT-BIH Arrhythmia data set without pre-processing. With regard to dimensionality reduction, wavelet dictionaries considerably improve upon the results achieved with the wavelet basis of the same family RNC19 ; LRN19 . This result motivated the present Communication. While in RNC19 the dictionaries have been used to demonstrate their suitability for dimensionality reduction of ECG signals at low level distortion, the details of their numerical construction were not given. This paper complements the previous work by presenting the algorithms for building dictionaries from the following mother wavelet prototypes:

  1. Chui-Wang linear spline wavelet CW92

  2. Chui-Wang quadratic spline wavelet CW92

  3. Chui-Wang cubic spline wavelet CW92

  4. primal CDF97 wavelet CDF92

  5. dual CDF97 wavelet CDF92

  6. primal CDF53 wavelet CDF92

  7. linear spline wavelet with short support and 2 vanishing moments

    Chen95 ; Han06

  8. quadratic spline wavelet with short support and 3 vanishing moments Chen95 ; Han06

  9. cubic spline wavelet with short support and 4 vanishing moments Chen95 ; Han06

  10. Daubechies wavelet with vanishing moments Daub88

  11. Daubechies wavelet with vanishing moments Daub88

  12. Daubechies wavelet with vanishing moments Daub88

  13. symlet with vanishing moments Daub93

  14. symlet with vanishing moments Daub93

  15. symlet with vanishing moments Daub93

  16. coiflet with 2 vanishing moments and support of length 6 which is the most regular Daub93

  17. coiflet with 3 vanishing moments and support of length 8 Daub93

The method proposed in RNC19 for modelling a given ECG signal proceeds as follows. Assuming that the signal is given as an -dimensional array, this array is partitioned into cells . Thus, each cell is an

-dimensional vector, which is modeled by an atomic decomposition of the form

(1)

For each cell , the atoms are selected from a dictionary through the greedy Optimized Orthogonal Matching Pursuit (OOMP) algorithm RNL02 ; LRN16 . The array is a vector whose components contain the indices of the selected atoms for decomposing the -th cell in the signal partition. The OOMP method, for selecting these indices and computing the corresponding coefficients in (1), is fully implemented by the OOMP function included as a tool of the software.

Each of the proposed dictionaries consists of two components. One of the components contains a few elements, say , from a discrete cosine basis. This component of the dictionary allows for the fact that ECG signals are normally superimposed to a smooth background. It is given as a matrix . The other component is the wavelet-based dictionary, which is given as a matrix . Thus, the whole dictionary is an matrix obtained by the horizontal concatenation of and . The next section is dedicated to the construction of .

The paper is organized as follows. Sec. 2 gives all the details for the construction of different wavelet prototypes and the concomitant wavelet dictionaries generated by those prototypes. Secs 3 and 4 deliver details and examples demonstrating the use of the MATLAB software for modelling ECG signals within the proposed framework.

The software has been made available on a dedicated webpage RNC19a . The implementation allows for straightforward further extension of the options for wavelet types.

2 Method

In this section we produce all the pseudo-codes for the construction of wavelets dictionaries, which can be used to achieve the model of every segment in a signal partition. As already mentioned, each dictionary is obtained by taking the prototypes from a wavelet basis and translating them within a shorter step than that corresponding to the wavelet basis.

Throughout the paper we adopt the following notation. Boldface fonts are used to indicate Euclidean vectors and matrices. Standard mathematical fonts are used to indicate components, e.g., is a vector of -components and is a matrix of elements . The symbol denotes the space of square integrable functions.

Wavelets are usually constructed starting from a multiresolution analysis, which is a sequence of closed subspaces of the space which are nested and their union is dense in , i.e.,

(2)

We assume that there exists a function such that for functions

(3)

form uniformly stable bases of the spaces , i.e., the bases are Riesz bases with bounds independent of the level , see e.g. Cohen2003 . The functions are called scaling functions and the function is called a generator of scaling functions. Next we present a method for the actual construction of the scaling functions.

2.1 Generation of scaling functions

We assume that has a compact support for some . From the nestedness of the multiresolution spaces , it follows that there exists a scaling filter such that

(4)

If then, integrating (4), we obtain

(5)

which implies that has to be normalized such that

(6)

The scaling equation (4) enables computing values of the scaling function at points for , . First we compute values of at integer points. Since , we have for . Let us define a vector

(7)

where the indicates the transpose operation. Substituting into (4), we obtain

We set for and and define a matrix by

(9)

Then, (2.1) is equivalent to

(10)

This means that

is an eigenvector corresponding to the eigenvalue

of the matrix . If the multiplicity of this eigenvalue is , then is given uniquely up to a multiplication by a constant. Our aim is to compute a vector such that

(11)

for a chosen level . From (7) and (11) we have

(12)

We compute values of at points . Note that for even we already know these values. Using (4) and (12) we obtain

for . Similarly, we compute values of at points , and thus we continue until we determine values at points . More precisely, for we assume that we know values of at , and we compute the values

(14)

Using (4) we obtain

Remark 1

Some scaling functions such as spline scaling functions are known in an explicit form and their values can be evaluated directly. However, an advantage of our approach is that it is more general and can be used for a large class of wavelet families.

2.2 Construction of wavelet generators from scaling functions

Let be complement spaces such that , where denotes a direct sum. Wavelet functions are constructed in the form:

(16)

to be a basis for and such that

(17)

called a wavelet basis, is a Riesz basis of the space .

Since there exists a vector such that

(18)

The vector is called a wavelet filter. From (18) we have

(19)

In Algorithm 1 we compute a vector such that

(20)

in the following way. Due to (18) and (20), we have

(21)

The sum in the last equation is computed as a cyclic sum. For

(22)

we set and for we do

(23)

if . Using the substitution

(24)

for , we obtain

(25)

Algorithm 1 computes vectors and for given scaling and wavelet filters. The filters corresponding to the wavelet families supported by the software are given in Appendix A (Algorithm 8).

Algorithm 1 Procedure [,] = WaveletGen(,,u)
   Input:
scaling filter
wavelet filter
level (integer) that determines points
   Output:
vector of scaling function values (c.f. (11))
vector of wavelet values (c.f. (20))
   {support length of }
   {normalization of (c.f. (6))}
  {Compute a matrix using (9)}
  
  for   do
     for   do
        if  then
           
        end if
     end for
  end for
  {Compute eigenvalues and eigenvectors of the matrix }
  [V,D]=eig()
  {Find an index of a column corresponding to eigenvalue 1}
   { is the multiplicity of eigenvalue 1}
  for   do
     if   then
         column=;
     end if
  end for
  if   then
      error(‘Impossible to construct scaling function: eigenvalue 1 must have multiplicity 1’)
  else
     
     {Eigenvector represents values of at integer points}
      {c.f. (12)} {Compute values of at points }
     for   do
        for   do
            ; {c.f. (14)}
           for   do
              if   then
                  {c.f. (2.1)}
              end if
           end for
        end for
     end for
      {Compute containing values of at points }
     
     for   do
         ;
         {c.f. (25)}
     end for
  end if

2.3 Construction of wavelet bases and dictionaries

Hereafter we drop all normalization factors and normalize all the vectors once they have been constructed. Note that in (17) we used a translation parameter and since is a Riesz basis the functions from are linearly independent. Now, we choose a parameter such that for some integer . We define functions

(26)

and

(27)

which form a redundant dictionary ARN05 ; ARN08 ; RNX10 . Obviously, corresponds to a basis.

The left graph of Figure 1 shows two consecutive wavelet functions taken from a linear spline bases CW92 . The right graph of Figure 1 corresponds to two consecutive wavelet functions taken from the dictionary spanning the same space which corresponds to .

Figure 1: Wavelet functions taken from a basis (left) and a dictionary (right) corresponding to a linear spline-wavelet prototype from CW92 .

Algorithm 2 constructs a discrete dictionary, i.e., a matrix which contains values of functions from (26) and (27) at equidistant points for some chosen levels determined by the vector . Since Algorithm 1 enables us to construct values at points of the form , we evaluate functions (26) and (27) at the points

(28)

where denotes the smallest integer number larger than .

For a chosen vector of levels , we define a vector of indices such that is the number of scaling functions at level , and is the number of wavelets at level for , where is the length of . We have

(29)

Comparing the supports of these functions and the interval

(30)

which contains the points from (28), we find that the number of inner scaling functions, i.e., scaling functions with the whole support in , is

(31)

where the symbol denotes the largest integer number smaller than . The number of left boundary scaling functions, i.e., functions that have only a part of the support in the interior of and their support contains , is

(32)

and similarly the number of right boundary scaling functions is

(33)

Hence, we have

(34)

Similarly, the number of wavelet functions on the level is

(35)

The first columns of contain values of scaling functions (26), which restricted to are not identically zero, at points given in (28), i.e.,

(36)

for , . The above equation can be recast:

where is defined by (11) for the level . Using the substitution , we obtain

(38)

under the assumption that .

The other columns of contain values of wavelet functions (27) for levels at points (28), i.e.,

(39)

for , , and . Similarly as above we obtain

(40)

where and is defined by (20) for the level .

The following procedure WaveletDict computes a wavelet dictionary.

Algorithm 2 Procedure [] = WaveletDict(namef, , , )
  Input:
namef name of a wavelet family, for available choices see Appendix A
number of points
vector of levels
translation factor for some integer
  Output:
wavelet dictionary
is the number of scaling functions at level , and for is the number of wavelets at level
cell array such that if the -th column of corresponds to values of a scaling function or a wavelet ; type=‘inner’ or ‘boundary’ characterizes type of a function; function=‘scaling’ or ‘wavelet’ indicates whether the column corresponds to the values of a scaling function or a wavelet
  {Compute scaling and wavelet filters using Algorithm 8 from Appendix A}
   [,,correct_name] = Filters(namef)
  {Test if a wavelet family namef is available}
  if  correct_name then
     
  end if
   {support length of }
   {support length of }
   {level characterizing (c.f. (28)}
  {Remove levels from that contain no inner function}
   {coarsest possible level}
   {removing the levels smaller than } {Test of parameters}
   {parameter from }
  if  then
      fprintf(‘no inner functions for these values of levels , increase ’)
     
  else if  then
      fprintf(‘small number of points for these values of and ’)
     
  end if
  {Compute scaling and wavelet generators using Algorithm 1}
  []=WaveletGen()
   {Compute number of scaling functions at level }
  
   {c.f. (34)}
  {Compute number of wavelets for level }
  for  do
      {c.f. (35)}
  end for{Compute columns of corresponding to scaling functions}
  
   {c.f. (32)}
   {c.f. (33)}
  {Compute columns corresponding to inner scaling functions (c.f. (38)}
  for  do
     
     
  end for
  {Compute columns corresponding to boundary scaling functions (c.f. (38)}
  for  do
     
     
  end for
  for   do
      {index of column}
     
     
  end for{Compute columns of corresponding to wavelets (c.f. (40))}
   {number of functions on previous levels}
  for  do
     
     
     
     for  do
        
        
     end for
     for  do
        
        
     end for
     for  do
        
        
        
     end for
     
  end for

The main procedure GenDict validates input parameters, generates dictionaries and normalizes their columns.

Algorithm 3 Procedure []= GenDict(name,pars)
  Input:
name name of a wavelet family, for available choices see Appendix A
pars parameters in the form pars =
  Description of the parameters:
number of points
vector of levels
translation factor for some integer
  Output:
wavelet dictionary
is the number of scaling functions at level , and for is the number of wavelets at level
cell array such that , if the -th column of corresponds to values of scaling function or wavelet ; type=‘inner’ or ‘boundary’ characterizes type of a function; function=‘scaling’ or ‘wavelet’ indicates whether the column corresponds to the values of a scaling function or a wavelet
  {Define cell array of names of all available families}
  families= ‘CW2’,‘CW3’,‘CW4’,‘CDF97’,‘CDF97d’,‘CDF53’, ‘Short4’,‘Short3’, ‘Short2’, ‘Db3’,‘Db4’,‘Db5’,‘Sym3’,‘Sym4’, ‘Sym5’,‘Coif26’,‘Coif38’
  {Validate input parameters}
  if   then
      error(‘Need 2 input arguments’)
  end if
  if   then
     error(‘Name must be a string’)
  end if
  
  if  then
     error(‘I expect ’)
  end if
  
  if  then
     fprintf(‘Choose such that for some integer ’)
     
  else if ismember(,families)  then
     {Generate dictionary using Algorithm 2}
      [] = WaveletDict(namef, , , )
     {Normalize columns of using Algorithm 9 from Appendix A}
      = NormDict(,1)
  else
      error(‘Unknown name of a wavelet family’)
  end if
Remark 2

It is worth remarking that the range of scales, say depends on length of the signal partition. For a signal segment of length a dictionary contains values of scaling functions and wavelets at points for some integer . For a signal segment of length a dictionary contains values of functions at points . Thus we have

(41)

and

(42)

Therefore, nonzero elements of vectors on the level in a dictionary for correspond to nonzero elements of vectors on the level in a dictionary for . This situation is illustrated in Figure 2, where vectors of values are displayed for and and for and . Note that the nonzero elements in these vectors are the same. Therefore, if for the signal segment of length the vector is used, then we recommend to use the vector for the signal segment of length , and similarly to use levels for the signal segment of length .

Figure 2: Vectors of values for and (left) and for and (right).

To build dictionaries for the wavelet family ‘Short3’ at levels 2 and 3, for translation parameter , and the number of points , use the procedure TestDict below.

Algorithm 4 Procedure TestDict
   namef=‘Short3’;
   []=GenDict(namef,)

The output is the matrix of size and the vector . This means that there are scaling functions at level 2, 27 wavelets at level 2, and 43 wavelets at level 3. The cell array characterizes functions corresponding to columns of . For example

(43)

which means that th column of the matrix contains values of a wavelet function . This wavelet is a boundary wavelet, i.e., only a part of its support lies in the interval defined by (30). Some of the vectors from this dictionary corresponding to values of scaling functions are displayed in Figure 3 and some of the vectors corresponding to values of wavelets are displayed in Figure 4.

Figure 3: Plots of vectors from the dictionary from Example 1 corresponding to scaling functions on the level .
Figure 4: Plots of vectors from the dictionary from Example 1 corresponding to wavelets on the level .

2.4 Construction of dictionaries for ECG modelling

As mentioned in Sec. 1, because ECG signals are usually superimposed to a baseline or smooth background, the full dictionary we use for ECG modelling is built as follows

(44)

where is the output of Algorithm 5 and is a matrix containing a few low frequency components from a discrete cosine basis. Before normalization is given as

(45)

where is a small number in comparison to . For the numerical examples of the next section we consider . Algorithm 5 computes .

Algorithm 5 Procedure = DCos()
   Input:
the size of the Euclidean space the vectors should belong to
number of frequencies to use starting from 0
   Output:
matrix whos columns are discrete cosine vectors
  
  
   = NormDict(,1)


2.5 Method for construction of the model

In this section we present the procedures for constructing the ECG signal model (c.f. Algorithm 6) and for calculating the assessment metrics. The quality of the signal approximation is assessed with respect to the defined as follows

(46)

where is the original signal and is the signal reconstructed by concatenation of the approximated segments .

The local with respect to every segment in the signal partition is indicated as and calculated as

(47)

For the signal approximation the OOMP method is stopped through a fixed value so as to achieve the same value of for all the segments in the records. Assuming that the target before quantization is we set .

The goal of the signal model is to approximate each segment in the signal partition using as few atoms as possible. Thus, for a fixed value of , the sparsity of the signal representation is assessed by the sparsity ratio (SR)

(48)

where is the total length of the signal and with the number of atoms in the atomic decomposition (1) of each segment of length . The corresponding quantity evaluated for every cell in the partition is the local sparsity ratio

(49)

This local quantity is relevant to the detection of non-stationary noise, significant distortion in ECG patterns, or changes of morphology in the heart beats.

Given an ECG signal the procedure described in Algorithm 6 constructs the signal approximation, , using the dictionaries introduced in the previous section.

Algorithm 6 Procedure []= SignalModel()
   Input:
signal
number of points in each segment of the partition
parameter to control the approximation error
name of a wavelet family
parameters as described in Algorithm 3
number of components in the cosine subdictionary
   Output:
approximated signal