Union of Low-Rank Subspaces Detector

07/29/2013
by   Mohsen Joneidi, et al.
0

The problem of signal detection using a flexible and general model is considered. Due to applicability and flexibility of sparse signal representation and approximation, it has attracted a lot of attention in many signal processing areas. In this paper, we propose a new detection method based on sparse decomposition in a union of subspaces (UoS) model. Our proposed detector uses a dictionary that can be interpreted as a bank of matched subspaces. This improves the performance of signal detection, as it is a generalization for detectors. Low-rank assumption for the desired signals implies that the representations of these signals in terms of some proper bases would be sparse. Our proposed detector exploits sparsity in its decision rule. We demonstrate the high efficiency of our method in the cases of voice activity detection in speech processing.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 6

page 9

page 10

page 11

page 12

page 14

12/12/2012

Dictionary Subselection Using an Overcomplete Joint Sparsity Model

Many natural signals exhibit a sparse representation, whenever a suitabl...
10/24/2016

Laplacian regularized low rank subspace clustering

The problem of fitting a union of subspaces to a collection of data poin...
03/06/2017

Generalizing CoSaMP to Signals from a Union of Low Dimensional Linear Subspaces

The idea that signals reside in a union of low dimensional subspaces sub...
09/21/2018

Two-step PR-scheme for recovering signals in detectable union of cones by magnitude measurements

Motivated by the research on sampling problems for a union of subspaces ...
01/22/2016

Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition

We propose to model the acoustic space of deep neural network (DNN) clas...
11/11/2018

Analysis vs Synthesis with Structure - An Investigation of Union of Subspace Models on Graphs

We consider the problem of characterizing the `duality gap' between spar...
12/16/2020

Change Detection: A functional analysis perspective

We develop a new approach for detecting changes in the behavior of stoch...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Sparse approximation techniques have found wide use due to their benefits and high flexibility in many applications in image and signal processing [1][2]. Sparse representation can efficiently extract most important features of a signal, so it provides very promising results in data compression [3], de-noising [4], blind source separation [5], signal classification [6], and so on. The methods based on exploiting the signal sparsity have two main steps. First, an over-complete dictionary [1] is selected/learned according to the structural characteristics of the set of signals, and then the target signal is decomposed over the dictionary to obtain a compact representation. Representation in terms of a few designed/learned bases can accurately capture the signal structure characteristics, which in turn, leads to an improvement in the distinction between noise/interference and structured signals.

In some signal processing applications, the task is to detect the presence of a signal from its noisy measurements. For example, in speech processing, Voice Activity Detection (VAD) is performed to distinguish speech segments from non-speech segments in an audio stream. VAD plays a critical role on increasing the capacity of transmission and speech storage by reducing the average bit-rate [7].

Signal detection is an old problem in signal processing and there are some traditional signal detectors including energy detector, matched filter and matched subspace detector [8]. Matched signal detector is the most basic framework for signal detection which needs a bank of matched signals to design a detector system. However, in many applications it is preferred to replace rank-1 signals by a multirank matched subspace [8]. Matched subspace detector assumes the span of a subspace as the desired signals and rejects that part of signal which lies on the null-space of the assumed subspace. Generalized likelihood ratio test (GLRT) for matched subspace detector is the uniformly most powerful invariant (UMP-invariant) statistic for detection [8]

. The subspace model for detection needs some bases as the span of desired signals which can be a set of fixed bases like discrete fourier transform (DFT) or data-dependent bases like principal component analysis (PCA). Although subspace model is more adaptive for signal analysis, it needs several parameters that must be either known or estimated. For example the set of bases spanning the desired signals, the coefficients of the bases, noise covariance and signal to noise ratio (SNR). Depending on the knowledge about different parameters, the optimum statistic is suggested in

[9] for 4 situations. In the case of unknown coefficients, orthogonal projection of the observation is used to determine the coefficients of the contribution of each basis. The present paper assumes a more general model for signals in which considers a union of subspaces.

Sparsity has been exploited widely in detection purposes, e.g., abnormal event detection [10], voice activity detection [11]

, face detection

[12]. A multi criteria detection based on intelligent switching between traditional detection and sparse detection is proposed in [13]

. In these works, sparsity has been used to extract features or define a heuristic criterion for detection. Compressive detector is another application of sparsity for signal detection. It is able to detect signals only by using some measurements from the original samples while the performance is not degraded dramatically

[14] [15]. The goal of compressive detector is to preserve the performance of detector the same as the original detector. In this paper we use sparsity from a different point of view. The traditional detectors are generalized to consider sparsity on the optimum decision rule and a new trade-off is suggested between sparsity (rank of a subspace) and error of projection (distance to a low-rank subspace).

In this paper, we propose a new signal detection method based on the union of low-rank subspaces (ULRS) model [16] [17]

. This model is able to reveal intrinsic structure of a set of signals. The proposed detector is a generalized version of traditional detectors. In other words, imposing a union of rank-1 subspaces model for desired signals yields nothing other than the traditional matched filter banks. We investigate our detector from different points of views in order to show relation between our method and other classical detectors. We also derive a robust version of the proposed detector in order to provide robustness against outliers and gross errors. We provide theoretical investigations as well as experimental results on VAD.

The rest of the paper is organized as follows. Section 2 provides a brief background on sparse representation theory and basic concepts of detection theory. In Section 3 we describe our new signal detection method, study its performance and provide its robust version. Section 4 experimentally demonstrates the effectiveness of our proposed signal detection method. Finally, Section 5 concludes the paper with a summary of the proposed work.

Ii Theoretical Background and Review

Ii-a Basic Theory of Sparse Decomposition

Sparse decomposition of signals based on some basis functions has attracted a lot of attention during the last decade [1]. In this approach, one wants to approximate a given signal as a linear combination of as few basis functions as possible. Each basis function is called an atom and their collection is called a dictionary [18]. The dictionary is usually over-complete, i.e., the number of atoms is much more than the dimension of atoms. Specifically, let be the signal which is sparsely represented over the dictionary with . This amounts to the following problem,

(1)

where stands for the so-called pseudo-norm which counts the number of nonzero elements. Many algorithms have been introduced to solve the problem of finding the sparsest approximation of a signal in a given over-complete dictionary (for a good review see [19]). For a specified class of signals, e.g. class of natural images, the dictionary should have the capability of sparsely representing the signals. In some applications there is a predefined and fixed dictionary which is well-matched to the contents of the specific class of signals. Over-complete DCT dictionary for the class of natural images is an example. These non-adaptive dictionaries are favorable because of their simplicity. On the other hand, learning based dictionary results in better matching the contents of the signals [1]. Most dictionary learning algorithms are indeed a generalization of the clustering algorithms. While in clustering each training signal is forced to assign only one atom (cluster center), in the dictionary learning problem each signal is allowed to use more than one atom provided that it uses as fewest as few atoms as possible. The general dictionary learning problem can be stated as follows,

(2)

Where the columns of contain the observed data and , the columns of , are sparse representations of the observed data. Most dictionary learning algorithms solve the above problem by alternatively minimizing it over and . Dictionary learning algorithms differ mainly in performing the minimization over the dictionary. Dictionary learning has an important role in the sparse decomposition based methods. A subsection in the proposed method section is allocated for discussion on dictionary learning.

Ii-B Basic Theory of Detection

In this section we review signal detection theory and study some related detectors to our proposed one. First consider the following model for detection

(3)

where

is the observation vector,

is the signal of interest and

is the observation noise of the model. First we assume that the probability density function of

and are known. In this case the likelihood ratio test (LRT) gives

(4)

where is a threshold that satisfies the desired amount of probability of false alarm. By Gaussian assumption on the noise with covariance matrix R, LRT simplifies to,

(5)

where is the covariance matrix. If it is not known in advance, it must be determined by obtaining the sample covariance matrix in the above test. Probability of detection is then equal to [20],

(6)

in which is the probability of false alarm and . Another well-known detector is Generalized LRT [21] (GLRT) which is derived by maximizing conditional densities constituting the likelihood ratio test with respect to the unknown parameters. The following detection criterion is obtained by assuming the covariance matrix to be unknown

(7)

where is the number of snapshots available for estimation. In [21] no optimality has been claimed for GLRT. However, Scharf and Friedlander have shown that GLRT is uniformly most powerful (UMP) invariant [8]. This is the strongest statement of optimality derived for a detector. GLRT detector may be interpreted as a projection on the null-space of the interference followed by a matched subspace detector [8]. Consider the following model for hypothesis test.

(8)

where spans the background or interference subspace, and determines contribution of each column of . spans signal subspace which is to be detected, and x determines the contribution of each column of . It is obvious that if or then or spans all the space of the signals. In other words, in this case or may be over-fitted for background detection and signal detection, respectively. On the other hand, restriction of and may result in unreliable subspaces which are unable to fit suitable matched subspace. The role of matched subspaces detector is as follows

(9)

where is the orthogonal projection matrix on the null-space of and is the part of orthogonal projection of which does not account for subspace spanned by . Figure 1 shows the block diagram of this detector.

Figure 1: Block diagram of the matched subspace detector.

At the conclusion of paper [8] authors mentioned that basis can be extracted from Discrete Cosine Transform, Wavelet Transform or learned by data dependent analysis like Principal Component Analysis (PCA). Using such basis provides a matched subspace for the whole desired signals which are going to be detected. For more illustration refer to Figure 2. This figure shows composites of some 3D data by signal and non-signal (interference and noise) parts. Two low-rank subspaces are shown corresponding to rank-1 and rank-2 subspace (the low-rank matched subspace) which are obtained by PCA.

Figure 2: Two low-rank subspaces learned by PCA for some 3D signals.

The main contribution of [8] may be answering ’no’ to the question, ’Can the GLRT be improved upon?’ while they did not assume any prior information on the structure of the low-rank matched filter. The structural assumption can be applied by assuming a sparse prior on the coefficients of and . The proposed method of this paper suggests using the model of ULRS for signals due to its suitable fitness which has been proven in many signal processing applications. Instead of traditional analysis like PCA, modern analysis like the methods proposed in [22], [23] and [24] can be exploited in order to recover suitable bases spanning these low-rank subspaces. Figure 3 shows a union of matched low-rank subspaces corresponding to the data of Figure 2.

Compressive detection is another application of the sparse theory exploited in signal detection and studied in [25] and [15]. Instead of dealing with all the samples of the signal, the compressed detector works with few measurements. This detector distinguishes between two hypotheses,

(10)

where is the measurement matrix and is the measurement. If no further prior is known about , no optimal can be designed, and random measurements yield a detector with the following performance [25].

(11)

in which, the performance of the detector is degraded by factor compared to the traditional matched filter. Having knowledge of results in a compressed detector as shown in [25].

(12)

in which, the performance of the detector is improved by a factor of compared to the random measurement detector. Reference [15] studied two cases about the knowledge of . The first case assumes that is known and the second case assumes that consists of a set of parametric basis, where the active basis of can be recovered by a sparse coding algorithm. Recently, [26] investigated the problem of detection of a union of low-rank subspaces via compressed measurements. The compressed detector still performs worse than the matched filter by factor .

In this paper we are going to exploit the low-rank structure characteristic of the signals to design a new detector. Our detector is not compressed and the goal is to design a generalized detector using sparsity (that is, assuming a structure) which implicitly exists in the signals. In Section 3 the proposed detector will be presented. Our detector first assumes a model according to sparse signals and then derives an optimum rule of detection.

Figure 3: A union of rank-1 subspaces provides suitable matched subspaces.

Iii The Proposed Approach

In this section, we introduce our model for signal detection. We want to distinguish between two hypotheses and :

(13)

where, is the dictionary which can be interpreted as a bank of matched filters, is the error vector of the model which denotes the mismatch between the exact matched filter and the union of subspaces spanned by the columns of . Assume that

is a zero-mean white Gaussian noise with variance

, i.e. . In our method, the signal () matched to the observed signal () is unknown; so it must be determined. This section is divided into four subsections. In the first subsection, we analyze the role of the coefficients of the linear combination () and then describe our approach for coefficients estimation. In the second subsection, the performance of our proposed detection method will be analyzed. Since dictionary learning is a critical issue in the model, third subsection is allocated for discussing on the dictionary learning. In the last subsection we will explain how our method may become robust to detect signals that are contaminated by gross errors.

Iii-a A discussion on the coefficients ()

Linear combination of the dictionary atoms generates the matched signal for detection. Three cases are considered for

estimation. First, no constraint solution, second matched filter bank and third applying Gaussian distribution. First assume that there is no constraint on

i.e, orthogonal projection of the signal onto the span of the desired subspace. This method is used in matched subspace method to identify the part of signal that amounts for the desired signals [8]. The solution for it will be,

(14)

This answer suffers from over-fitting as some signals that do not contain the target signal may be decomposed in terms of the atoms. More restricted constraints may alleviate this problem. Now let us assume that just one element of is allowed to be none zero. This constraint helps reducing over-fitting. By this assumption the problem becomes,

(15)

The solution will be zero except in the position corresponding to the atom with maximum correlation. This solution is nothing but the traditional matched filter bank. Each matched filter which has more correlation is considered as the matched signal. All correlations are sufficient statistics for the decision. If all the correlations are less than a threshold, no detection is performed.

The third scenario we study is assuming Gaussian prior on . The motivation of considering this assumption for is to avoid over-learning and moreover having less sensitive coefficients. Estimation of by the assumption of Gaussian distribution on and can be obtained as follows,

(16)

This solution for the coefficients of linear combination is the Ridge regression

[27]. Solution (15) is the least over-learned and solution (14) is the most over-learned one. It is interesting to see how each of the solutions covers the signal space for learning. Solution (15) provides high learning for few one dimensional subspaces corresponding to each atom, while solutions (14) and (16) provide high learning for many subspaces corresponding to arbitrary selections of the atoms. Involvement of all the atoms to form the matched signal results in detection of undesired signals as the target signal due to the expansion of the matched subspaces. To keep the number of involved atoms limited, we suggest modifying problem (16) as follows,

(17)

There is a large enough value for such that the solution of the above problem is the same as (15). Now we show that this problem is the MAP estimation of under multivariate independent Gaussian prior,

(18)

where is a diagonal matrix. By this assumption, two unknowns must be estimated. First we obtain the ML estimation of ,

(19)

By setting the derivative with respect to equal to zero, the solution of Equation (19) is which has no solution, however we need only the diagonal elements of due to the independent assumption on the entries of . So, calculating the derivative with respect to only diagonal elements of () results in,

(20)

where, is a small positive for avoiding division by zero. Then we insert the obtained in (18):

(21)
Figure 4: Sparse representation of some structural data whose distribution is in agreement with the one defined by Eq. (21).

Actually, is an auxiliary parameter which is used just for more adaptation of the coefficients distribution. The obtained W results in a distribution with more probability of having orthogonal low rank subspaces (in the space to which belongs; for more illustration see Fig. 4). Corresponding to these orthogonal low rank subspaces there are non-orthogonal low rank subspaces in the observation domain which or belongs to this space. The MAP estimation of by prior of (21) results in the suggested problem (17) which is a generalized version of (15) from the aspect of sparsity level of the coefficients, and a generalized version of (16) from the aspect of prior distribution on the coefficients for estimation. In [4], it is proved that in a certain condition, problem (17) leads to the same solution with the following regularized problem:

(22)

Iii-B Performance analysis

First, we define the false alarm rate and the detection alarm rate,

(23)

Parameter satisfies the desired amount of false alarm probability, .

(24)

By solving , the threshold for decision rule can be achieved,

(25)

where is a constant value depending on and and the desired . The sufficient statistic for decision making is . It is easy to show that,

(26)

where and . As can be seen, the performance of the detector is degraded by a factor of . But our detector has learned a suitable space for signals to be detected. In other words, we accept a small deterioration of the performance duo to the generalization of the detector. Flexibility of the sparse representation based detector is the most distinguished advantage. Dictionary learning [23] is the most important issue for the methods based on sparse representation. In the sparse detector, the dictionary should be learned such that ESR to avoid performance deterioration and at the same time ESR to avoid over-learning. In the next section we will explain how to learn an appropriate dictionary. In (25), sparsity has no effect on the performance. Now we introduce a decision rule for detection that exploits the sparsity of the coefficients. To this end, we solve equation by the obtained in the equation (21). The new decision rule can be achieved as follows,

(27)

where is a positive constant value. As increases, may be more probable, because the signals representation in terms of the dictionary would be sparse only for the learned signals. Similar to (26), it is easy to show that,

(28)

where is an increasing homogenous function. As can be seen, the probability of detection increases (decreases) when sparsity increases (decreases) for false alarm rates smaller than (because when then ). As the desired false alarm rates are often small, the probability of detection would increase in this region (it is favorable for a detector that the top-left region of its ROC be close to the ideal ROC). If the representation of a signal is sparse, this signal lies in the desired low-rank subspace (that is, it meets our assumed model for the target signals). Thus the probability of detection would increase for these signals that have sparse representation in terms of the dictionary atoms, which is actually what we expect from sparsity. Figure 5 shows the ROC of (22) with SNR=+20dB for different sparsity levels.

Figure 5: Trade-off between sparsity and ESR and its effect on the detector performance.

Traditional matched filter banks have the most sparsity level, but it is not practical. For instance, in voice activity detection, it is not feasible to collect all possible voices in a bank. A small number of filters results in high ESR and low performance. Our proposed detector makes a trade-off between ESR and sparsity in order to have a good detector performance. Dictionary learning has a critical role in the trade-off which is studied in the following.

Iii-C Learning the Dictionary

In this section we explain the role of dictionary learning in the proposed detection method. In many detection problems, the number of training signals may not be as large as the number of possible matched filters that cover all the target signals space. By the proposed approach, we search for a dictionary learned by a set of finite number of signals that efficiently represents those signals. The dictionary should be general to be able to deal with a signal that has not been seen before. Assume that we have a set of signals (). Dictionary learning is a function that maps to where . An appropriate dictionary should have ESR to be a suitable representation for the training data and also ESR should not be too small to have a general dictionary that is not over-learned for only the training data. Two algorithms for dictionary learning are presented.

Iii-C1 K-means algorithm

K-means method uses K centroids of clusters, to characterize the training data [28]. They are determined by minimizing the sum of squared errors,

(29)

where the columns of are , . The provided dictionary assigns to each training data a centroid. should be large enough to satisfy the desired amount of ESR. Problem (14) has to be solved to determine the coefficients so that only one of them is none zero. This dictionary learns some points in the signal space. As the distance from these points increases, the level of learning would decrease. In other words, this dictionary is obtained by the union of spheres model. This model may not be a suitable choice for ordinary signals. The next algorithm agrees with a more appropriate model for the data. The KSVD learns the signal space with a union of low-rank subspaces.

Iii-C2 K-SVD algorithm

By extending the union of spheres to a union of low-dimensional subspaces, K-means algorithm is generalized to K-SVD algorithm [23]. This flexible model agrees with many signals such as images and audio signals. For example, natural images have sparse representation in terms of DCT dictionary. In other words, by combination of only a few DCT bases, it is possible to approximate the blocks of an image. The following problem provides the dictionary learned by K-SVD,

(30)

This algorithm is based on atom-by-atom updating over the columns of . Recently, more efficient algorithms for atom-by-atom updating are suggested in [29]. Each arbitrary selection of few columns characterizes a cluster corresponding to a subspace. The dictionary learned by K-SVD is in agreement with the proposed problem (22). After learning, test signals that lie on the learned low dimensional subspaces can be reconstructed and detected. In addition to dictionary learning using training signals, it is possible to design a dictionary using parametric functions [30]. Kernels of FFT and DCT are two examples from this class of dictionaries where bases sweep the parameter of frequency. Figure 6 shows the block diagram of the proposed detection method.

Figure 6: Block diagram of our proposed detector based on dictionary learning.

Iii-D Robustness

Assume that a dictionary has learnt to detect face images without sun glasses. If a face image with sun glasses is given to it for detection, gross error in the region of eyes may result in a wrong detection. To solve this problem, a distribution has to be supposed that has longer tail than Gaussian. Laplace distribution is our suggestion for the error distribution. Thus implies that the observed signal is the combination of few atoms of , a Laplace distributed error and a Gaussian distributed noise.

(31)

The problem of coefficients estimation for (22) by new prior assumption has been already presented in robust statistics [31].

(32)

where,

(33)

In other words, small errors and large errors are penalized by norm and norm, respectively. is the parameter of the mixture distribution of Gaussian and Laplace. Let re-write (32) as follows,

(34)

Let us define as .

(35)

By substitution of and , we have,

(36)

This problem is similar to (22

) except that its dictionary is extended by scaled identity matrix. Identity matrix projects inappropriate parts of the signals onto corresponding coefficients. Inappropriate parts of the signals may be large errors or out of the desired subspace interferences or outlier data. Authors of

[32] also intuitively have used the same dictionary to obtain a robust framework for face recognition. A same procedure can be pursued to learn robust dictionary by a set of unreliable data [33].

Iv Experimental Results

We evaluated the performance of our proposed method in the case study of VAD. To construct the learned dictionary, clean speech signals of NOIZEUS database were used [34]. In NOIZEUS database, thirty sentences were selected which include all phonemes in the American English language. The sentences were produced by three male and three female speakers and originally sampled at 25 kHz and down-sampled to 8 kHz. We divided the clean speech signals into 25-ms frames with 10-ms frame shift. After removing the silent frames, we extracted standard Mel-frequency Cepstral Coefficients (MFCC) using 10 Mel triangular filters, energy values computed at each of the 10 Mel triangular filters, total energy (the first Cepstral coefficient) and entropy from each speech frame. MFCC features capture the most relevant information of speech signal, and they are widely used in speech and speaker recognition making the VAD method easy to integrate with existing applications. So our features vector was 24-dimensional, and the total number of vectors was about 6300. By using the K-SVD algorithm, we obtained a learned dictionary with 100 atoms, which was used in the following experiments for obtaining the sparse representation based on OMP method.

Figure 7: Block diagram of our proposed detector based on dictionary learning.

To evaluate the performance of the proposed method, the speech detection probability PD and false alarm probability PF were investigated based on a reference decision. A clean test speech (sp10.wav), taken from the NOIZEUS database, was down-sampled at 8000 Hz and was used for the reference decisions. To simulate noisy environments, several noise signals as the subset of the NOIZEUS database were used. Noise signals included recordings from different places (Babble (crowd of people), Car,…) at SNRs of 0dB, 5dB, 10dB, and 15dB. The ROC Curves for VAD using our proposed method are illustrated in Fig. 7 which shows PD versus PF.

Sparsity in voice activity detection has been exploited already. E.g, a feature extraction is performed to suggest a decision rule for detection in

[11]. We compared the result of our method with the sparsity-based VAD method proposed in [11]. As can be seen in Fig. 8, our method shows better performance in low SNR conditions.

Figure 8: Comparison the performance of our proposed method with sparse non-negative coding based VAD [11], matched subspace detector [8] where the bases are found by PCA and detection using the compressed measurements [20].

V Conclusion

This paper presented a new sparsity-based detector. The performance of the method was evaluated in a realistic application: voice activity detection in speech signal processing. Our detector proposed a new trade-off for designing detectors by assuming the union of low-rank subspaces model. The trade-off is between the sparsity and the error of union of low-rank subspaces model denoted by ESR. In our detector the number of filter banks is proportional to the size of the dictionary. Appropriate dictionary is able to regularize the sparsity and the introduced parameter ESR. Simulation results showed that the proposed method is effective and has a high anti-noise ability due to optimum projection of signals to reliable learned low-rank subspaces.

References

  • [1] M. Elad, Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer Publishing Company, Incorporated, 1st ed., 2010.
  • [2] A. M. Bruckstein, D. L. Donoho, and M. Elad, “From sparse solutions of systems of equations to sparse modeling of signals and images,” SIAM Rev., vol. 51, pp. 34–81, Feb. 2009.
  • [3] A. Rahmoune, P. Vandergheynst, and P. Frossard, “Sparse approximation using m-term pursuit and application in image and video coding,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1950–1962, 2012.
  • [4] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” Trans. Img. Proc., vol. 15, pp. 3736–3745, Dec. 2006.
  • [5] P. Comon and C. Jutten,

    Handbook of Blind Source Separation: Independent Component Analysis and Applications

    .
    Academic Press, 1st ed., 2010.
  • [6] J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, T. Cog, J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, Équipes-projets Willow, and E. N. Supérieure, “Supervised dictionary learning,” 2008.
  • [7] A. Benyassine, E. Shlomot, H. yu Su, D. Massaloux, C. Lamblin, and J.-P. Petit, “Itu-t recommendation g.729 annex b: a silence compression scheme for use with g.729 optimized for v.70 digital simultaneous voice and data applications,” Communications Magazine, IEEE, vol. 35, pp. 64–73, Sep 1997.
  • [8] L. Scharf and B. Friedlander, “Matched subspace detectors,” Signal Processing, IEEE Transactions on, vol. 42, pp. 2146–2157, Aug 1994.
  • [9] S. Kraut, L. Scharf, and L. McWhorter, “Adaptive subspace detectors,” Signal Processing, IEEE Transactions on, vol. 49, pp. 1–16, Jan 2001.
  • [10] P. Ahmadi, S. Khoram, M. Joneidi, I. Gholampour, and M. Tabandeh, “Discovering motion patterns in traffic videos using improved group sparse topical coding,” in Telecommunications (IST), 2014 7th International Symposium on, pp. 343–348, Sept 2014.
  • [11] P. Teng and Y. Jia, “Voice activity detection via noise reducing using non-negative sparse coding,” Signal Processing Letters, IEEE, vol. 20, pp. 475–478, May 2013.
  • [12] T. Le, K. Luu, and M. Savvides, “Sparcles: Dynamic

    sparse classifiers with level sets for robust beard/moustache detection and segmentation,”

    Image Processing, IEEE Transactions on, vol. 22, pp. 3097–3107, Aug 2013.
  • [13] B. Shim and B. Song, “Multiuser detection via compressive sensing,” Communications Letters, IEEE, vol. 16, pp. 972–974, July 2012.
  • [14] M. Duarte, M. Davenport, M. Wakin, and R. Baraniuk, “Sparse signal detection from incoherent projections,” in Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, vol. 3, pp. III–III, May 2006.
  • [15] M. Davenport, P. Boufounos, M. Wakin, and R. Baraniuk, “Signal processing with compressive measurements,” Selected Topics in Signal Processing, IEEE Journal of, vol. 4, pp. 445–460, April 2010.
  • [16] Y. Lu and M. Do, “A theory for sampling signals from a union of subspaces,” Signal Processing, IEEE Transactions on, vol. 56, pp. 2334–2345, June 2008.
  • [17] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35, pp. 171–184, Jan 2013.
  • [18] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” Signal Processing, IEEE Transactions on, vol. 41, pp. 3397–3415, Dec 1993.
  • [19] J. Tropp and S. Wright, “Computational methods for sparse solution of linear inverse problems,” Proceedings of the IEEE, vol. 98, pp. 948–958, June 2010.
  • [20] M. A. Davenport, M. B. Wakin, and R. G. Baraniuk, “Detection and estimation with compressive measurements,” tech. rep., 2006.
  • [21] E. Kelly, “An adaptive detection algorithm,” Aerospace and Electronic Systems, IEEE Transactions on, vol. AES-22, pp. 115–127, March 1986.
  • [22] M. Sadeghi, M. Joneidi, M. Babaie-Zadeh, and C. Jutten, “Sequential subspace finding: A new algorithm for learning low-dimensional linear subspaces,” in Signal Processing Conference (EUSIPCO), 2013 Proceedings of the 21st European, pp. 1–5, Sept 2013.
  • [23] M. Aharon, M. Elad, and A. Bruckstein, “K-svd: An algorithm for designing overcomplete dictionaries for sparse representation,” Signal Processing, IEEE Transactions on, vol. 54, pp. 4311–4322, Nov 2006.
  • [24] R. Rubinstein, T. Peleg, and M. Elad, “Analysis k-svd: A dictionary-learning algorithm for the analysis sparse model,” Signal Processing, IEEE Transactions on, vol. 61, pp. 661–677, Feb 2013.
  • [25] Z. Wang, G. Arce, and B. Sadler, “Subspace compressive detection for sparse signals,” in Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pp. 3873–3876, March 2008.
  • [26] Y. Eldar and M. Mishali, “Robust recovery of signals from a structured union of subspaces,” Information Theory, IEEE Transactions on, vol. 55, pp. 5302–5316, Nov 2009.
  • [27] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, vol. 12, pp. 55–67, 1970.
  • [28] A. K. Jain, “Data clustering: 50 years beyond k-means,” Pattern Recogn. Lett., vol. 31, pp. 651–666, June 2010.
  • [29] M. Sadeghi, M. Babaie-Zadeh, and C. Jutten, “Learning overcomplete dictionaries based on atom-by-atom updating,” Signal Processing, IEEE Transactions on, vol. 62, pp. 883–891, Feb 2014.
  • [30] M. Yaghoobi, L. Daudet, and M. Davies, “Parametric dictionary design for sparse coding,” Signal Processing, IEEE Transactions on, vol. 57, pp. 4800–4810, Dec 2009.
  • [31] P. Huber, J. Wiley, and W. InterScience, Robust statistics. Wiley New York, 1981.
  • [32] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, pp. 210–227, Feb 2009.
  • [33] S. Amini, M. Sadeghi, M. Joneidi, M. Babaie-Zadeh, and C. Jutten, “Outlier-aware dictionary learning for sparse representation,” in Machine Learning for Signal Processing (MLSP), 2014 IEEE International Workshop on, pp. 1–6, Sept 2014.
  • [34] P. C. Loizou, Speech Enhancement: Theory and Practice. Boca Raton, FL, USA: CRC Press, Inc., 2nd ed., 2013.