I Introduction
Learned dictionary based sparse representation (SR) finds successful application in various signal processing domains, such as image denoising [1], image recognition [2, 3]
[4, 5, 6], speaker identification/verification [7, 8, 9], and fingerprint identification[10]. In SR domain, existing data driven dictionary learning techniques can be broadly divided into three categories: supervised, semisupervised and unsupervised. The dictionary learned utilizing class labels are referred to as supervised. Whereas those learned using weak supervision in form of any assumed structure/constraint are termed as semisupervised dictionary. Both these kinds of dictionaries produce more discriminative sparse codes than the unsupervised ones, thus result in better classification performance. In SR domain, usually redundant (overcomplete) dictionaries are preferred. Such dictionaries have more columns (atoms) than rows (data dimensionality). Sometimes, lesser number of examples than the data dimensionality involved are available for learning the dictionary. Thus, only undercomplete dictionary could be learned unless we project the data to appropriate lowdimensional space. Nevertheless, the use of undercomplete dictionaries have been reported in SR based classification tasks [9, 11].In recent past, the dictionary learning has received a lot of attention in SR domain. Combining the Kmeans clustering and the singular value decomposition (SVD), a widely used dictionary learning approach is proposed and is referred to as KSVD
[12]. In learning of the KSVD dictionary, the reconstruction error is minimized under the constraint on sparsity for the given data. Though not being optimized for producing discriminative sparse codes, some works have explored the KSVD dictionaries in the classification task [4, 9]. In the literature, a few classificationdriven dictionaries are also proposed such as supervised KSVD (SKSVD) [13] dictionary and labelconsistent KSVD (LCKSVD) [14]dictionary. The SKSVD algorithm incorporates the Fisher discriminant criterion in dictionary learning. Whereas in the LCKSVD algorithm, a linear transformation that maps the sparse code to more discriminative one is also learned along with the dictionary. Though yielding enhanced classification performance, these supervised dictionaries neither use any block structure nor explicitly minimize the withinclass redundancy. As the result of that, such dictionaries are found to yield inconsistent sparse codes for the same class data.
In addition to supervised dictionary, some blockstructured dictionaries are also proposed. The introduction of block structure in a dictionary is noted to enhance not only its reconstruction ability [15] but also its classification ability [16]. Initial works simply exploit the known block structures in sparse coding with no emphasis on learning such dictionaries [17, 18, 19, 20]. The blockKSVD (BKSVD) [15]
dictionary is probably the first attempt towards learning an unsupervised blockstructured dictionary. It employs a sparse agglomerative clustering (SAC) algorithm for estimating the unknown block structure. Given a dictionary, the SAC algorithm estimates the block structure by iteratively grouping its atoms based on sparse coding. As the SAC employs orthogonal matching pursuit (OMP)
[21] for sparse coding, the grouped atoms happen to be diverse (less correlated). Thus, if the given dictionary comprises of correlated atoms those are less likely to be grouped together in the SAC approach. This affects the classification performance due to inconsistency in sparse coding. Addressal the above mentioned weakness in the estimated block structure is the prime motivation behind this work.Further, in the context of image recognition, we come across a proposal of supervised blockstructured dictionary learning approach that employs intrablock coherence suppression for reducing the redundancy and is referred to as the IBCS [3] dictionary. The minimization of intrablock coherence in a dictionary is critical for consistency in the resulting sparse codes. In that work, the block structure is initialized in a supervised manner and is kept fixed during the dictionary learning. It would be interesting to explore the adaptation of block structure while retaining the class supervision. Motivated by these works, we propose a classificationdriven dictionary learning approach and contrast its performance with existing approaches on synthetic data as well as real data. The main contributions of this work are as follows:

A novel block structuring algorithm is proposed that exploits the similarity among dictionary atoms rather than that among the resultant sparse codes.

The proposed block formation approach is shown to reduce the interblock coherence as well as providing a more precise control on the block size in contrast to the SAC algorithm.

Use of class supervision in block formation for enhancing the classification performance achieved with the learned blockstructured dictionary.
The remainder of the paper is organized as follows. First, the prior work on the dictionary learning using the block structure is discussed in Section II. The proposed correlation based greedy clustering algorithm is discussed in Section III. In Section IV, we formulate the classification driven dictionary learning approach. Section V and Section VI present the evaluation of the proposed approach on synthetic and real data, respectively. The paper is concluded in Section VII.
Ii Prior work on blockstructured dictionary
The idea behind learning a blockstructured dictionary is to exploit any structure that is embedded in the signals for producing more efficient sparse representation. A variety of algorithms have been proposed in the literature for this purpose. In initial works [17, 19], it is assumed that the block structure of the dictionary is known a priori. Later, in [15], an unsupervised SAC algorithm is proposed for deriving the block structure from the data. The dictionary is learned using the BKSVD algorithm while iteratively updating both the block structure and the atoms of the dictionary. Learning of a blockstructured dictionary along with its block structure having a maximum block size of can be formulated as,
(1)  
such that  
where is the dictionary having numbers of dimensional atoms, is the data matrix, is sparse code matrix, is the Frobinous norm, is the norm over and finds the number of nonzero blocks, is set of indices in the th block, is chosen block sparsity and is the number of blocks.
The dictionary update process in the KSVD and BKSVD algorithms is quite similar, except the later involves blockbyblock update. In contrast to the KSVD algorithm, the BKSVD algorithm requires about times lesser number of SVDs. Thus, the computational complexity in the BKSVD dictionary update is significantly lesser. The SVD ensures that intrablock atoms are orthonormal. This minimizes the redundancy in the dictionary as well as the inconsistency in the sparse coding, hence improving the classification performance.
Iia Sparse Agglomerative Clustering
The SAC algorithm employs an iterative process for estimating the block structure from the sparse code matrix of the training data obtained using the OMP. The algorithm starts by considering each atom as a block. At each iteration, it merges two blocks based on the maximum intersection in the involved sparse codes while satisfying the constraint on the maximum block size. To illustrate this merging process, a toydictionary has been created by taking three arbitrary atoms and out of a larger sized dictionary . Further,
arbitrary selected training data vectors are sparse coded over that toydictionary. In this illustration, as the dictionary has only
atoms, there are possible ways to merge any two atoms to make a block. On finding the intersection among the obtained sparse codes, the pair of atoms having the largest intersection among the sparse codes are grouped together. Fig. 1 shows the block formation at very first iteration of the SAC algorithm. For atoms and , the match in the indices of sparse codes happens to be maximum, so these atoms form the first group.In the existing SAC process, the OMP algorithm has been employed for sparse coding of the data. The OMP, being a greedy iterative algorithm, selects one atom at each step that is most correlated to the current residual. The selected atoms are highly uncorrelated with each other. Thus, in the SAC process, the formed blocks contain diverse atoms rather than similar ones. Assume a dictionary happens to contain two or more moderately correlated atoms. So while sparse coding the data over that dictionary, the OMP algorithm is expected to select any one of them based on the similarity. Fig. 2 graphically displays the OMP based sparse coding of a target vector over a dictionary having two correlated atoms say and . After selecting either of them, the current residual will no longer lie in the directions of the correlated atoms rather it would become correlated with another atom in the dictionary say . On account of that there exists a finite possibility that those correlated atoms will appear in different blocks if the SAC process is followed. In classification task, the existing SAC based blockstructured dictionary may produce the sparse codes for the same class enrollment and test data involving different blocks. This inconsistency in the sparse coding leads to degradation in the classification performance. One can also employ other sparse coding schemes that do not perform orthogonalization while selecting the atoms unlike the OMP. The least angle regression (LARS) [22] algorithm is one such alternative and we hypothesize that it should provide some improvement in the SAC. Following this argument, we modified the existing SAC process to include LARSbased sparse coding and created a new blockstructured dictionary.
For assessing the quality of the block structure, we have computed the pairwise correlations among all atoms of the OMPSACbased and the LARSSACbased block dictionaries. Both these dictionaries have been initialized with the same atom KSVD dictionary while learning. The number of pairs having correlation value greater than in these two dictionaries are plotted in Fig. 3. On comparing the number of atompairs having higher correlation than the chosen threshold, the LARSSACbased BKSVD dictionary is noted to exhibit significantly lower interblock coherence than the OMPSACbased BKSVD dictionary. Later in Section VIB, we also show that the LARSSACbased BKSVD dictionary also yields better SV performance than the existing OMPSACbased BKSVD dictionary. Motivated by the improved block structure quality with increased correlation among grouped atoms, we explore the clustering of the atoms into block structure based on correlation criterion rather than the intersection among indices of the sparse codes. In the next section, we describe the correlation based greedy clustering algorithm for producing the blockstructured dictionary. Following that we explore the inclusion of class information in block formation to produce more discriminative blockstructured dictionary.
Iii Correlation based Greedy Clustering Algorithm
In this section, we propose an approach for determining the block structure given a dictionary exploiting the similarity among its atoms. As the clustering is done on the basis of the pairwise correlation among the dictionary atoms, we refer to this approach as Correlation based Greedy Clustering (CGC) algorithm. The flow diagram of this algorithm is shown in Fig. 4(a) along with an example explaining the involved steps.
Initially, no block structure is assumed, i.e., . From the given dictionary, first the absolute correlations among the atompairs are computed and arranged in form of a symmetric matrix . For storing the information about the pair indices an indicator matrix is created. First column of the matrix denotes the indices of dictionary atoms that are yet to be assigned to any block. Each row of , excluding the very first entry, simply lists the indices of all the atoms that are available for grouping. At each iteration, the CGC algorithm finds a predefined block size numbers of top most correlated atoms. For this purpose, the cumulative sum of largest correlation values in each row of is computed and stored in a local vector . The indices of selected top correlated atoms are stored into another local selection matrix . The group of atoms that result in highest cumulative correlation sum form a new block. After finding the new block, first the block structure is updated. Following that the matrices and are updated by discarding the correlation and index information of all those atoms that have formed the block, respectively. These steps are repeated until all dictionary atoms have been assigned to a block. At each iteration only one block is formed and when the maximum block size criterion is no longer satisfied the remaining atoms are group in a lower size block. For fast update, the order of the indices of yet to be grouped atoms in should be kept same as in the symmetric matrix as shown in Fig. 4(b).
In CGC algorithm, with formation of every new block the value of the pairwise correlation for the unassigned atoms keep decreasing. This results in the last few unassigned atoms exhibiting very small correlation. Assigning these atoms to predefined bigger size block may affect the consistency of the sparse coding. To address this issue, we have gradually reduced the maximum block size in steps of one when about of the total atoms of the dictionary are left to be grouped.
The steps of the CGC algorithm are illustrated in Fig. 4(b) by considering an arbitrary dictionary having atoms. For ease of illustration, the maximum block size is kept as . At beginning, the dictionary block structure is initialized as . In the first iteration, the atoms having indices and are found to have the highest correlation. So those form the first block. In the second iteration, the highest correlation is noted for atoms having indices and , so the second block is formed by them. In the third iteration, only th atom is left to be assigned, so it alone forms the third block. Thus, the estimated block structure turns out as .
Iv Classification Driven BlockStructured Dictionary
In addition to the SACBKSVD based dictionary, we find a proposal of intrablock coherence suppression (IBCS) [3] based blockstructured dictionary in the context of image recognition task. Unlike the SACBKSVD approach, in the IBCS dictionary learning, the class labels are used in determining the blocks and so obtained block structure is kept fixed during dictionary learning. On exploring in SV task, we found that the IBCS based blockstructured dictionary outperforms both SACBKSVD and CGCBKSVD based dictionaries. Despite having unadapted block structure, the improved detection cost obtained for the IBCS dictionary highlights the impact of the class supervision in block formation. Motivated by that, we propose a novel block structuring scheme that allows the grouping of atoms withinclass only.
The proposed scheme is intended to enhance the ability of a dictionary to produce more discriminative sparse codes. The discriminative sparse codes should exhibit following property
(2) 
where is the sparse coefficient matrix for the th class training dataset and the atoms in block structure corresponding to the th class. Therefore, in ideal case, all nonzero coefficients for correspond to only.
Towards achieving this goal, we define an objective function for learning a discriminative dictionary as
(3) 
where is the th class training data, is the th class subdictionary, is number of blocks in the dictionary and, and denote the th block and its indices, respectively. In (3), the first term ensures the good reconstruction ability, the second term reduces the intrablock redundancy and the third term enhances the discrimination in sparse codes to aid the classification. The simultaneous optimization of all three constraints in (3) may not be feasible.
Here, we wish to highlight that the first two constraints in (3) can be optimized by evoking the existing BKSVD dictionary learning technique. In fact, it would ensure that all intrablock coherences are zero. The formation of blocks using either the SAC or the proposed CGC algorithm does not utilize the class information. So the atoms within the blocks may belong to two or more classes. As a result of that, the dominant coefficients in the sparse coding of training data belonging to different classes could involve the same set of blocks. With multiclass data being involved with a block, the updated dictionary will lose the ability to produce more discriminative sparse coding of the data. Towards addressing this issue, we have explored the inclusion of the class supervision in the block formation. In the following subsections, the details of supervised block structuring and the dictionary update using the wellknown SVD approach are presented. The overview for learning the proposed dictionary is given in Algorithm 1.
Iva Block structure: initialization and update
For including the class supervision in the block formation, first the dictionary is initialized by selecting predefined number of examples from each of the classes. Let the class indices for such a dictionary be stored in a vector defined as
(4) 
where is number of examples in the th class.
Now for optimizing the block structure within each class, the proposed CGC algorithm is invoked in a constrained manner. To preserve the class supervision, appropriate constraints are introduced in the merging process of CGC algorithm to allow the grouping of atoms from the same class only. For this, a simple approach would be to perform block structuring separately for each class while indexing the blocks across the classes uniquely.
IvB Dictionary Update
Given the block structure and the initial dictionary, the training data is sparse coded using the blockOMP (BOMP) algorithm [23]. For updating the dictionary, all training data vectors associated with each of the blocks in the resulting sparse codes are collected. Let denotes the list of indices of all those training data that have nonzero sparse coefficient for the th block in the th class . For updating the th block in , the representation error excluding the contribution of that block is computed as
(5) 
where is the error matrix or the data for the th block in the th class, is the number of blocks in the th class, is the indices of th block in the th class and remaining terms have the usual meaning. Now is factorized into using the SVD algorithm. The representation error is minimized by replacing the dictionary atoms and the selected sparse coefficients with top rank components obtained using SVD as
and
(6) 
Both dictionary and its block structure are updated to achieve convergence or for predetermined number of iterations. Obviously, the atoms in the updated happen to be orthonormal to each other. Thus one of the criteria laid in (3) is met perfectly.
V Experiments on the synthetic data
In this section, we evaluate the proposed CGC algorithm for recovery of the underlying block structure and compare the reconstruction errors for the KSVD based block dictionaries learned using the OMPSAC and CGCbased block structuring algorithms on synthetic data. All experiments are repeated times and their averaged performance are reported.
Va Block recovery
For this study, we decided to create block dictionaries having atoms with dimensional data. For this purpose, first a randomly initialized matrix having rows and columns was created, where is chosen block size. For each of the columns in the created matrix, additional clones were derived by adding random noise in varying scales. On collecting all these columns, we obtained a atom initial dictionary having an oracle block structure such that each block contains atoms. For studying the effect of degree of correlation among the initial dictionary atoms, separate dictionaries were created following the above outline procedure with the average intrablock correlation being controlled by noise added during cloning. For all such synthetically created initial dictionaries, the average interblock correlation for top atompairs is found to lie in the range .
Fig. 5 shows the block recovery performances of the CGC algorithm for varying intrablock correlation and block size. A block is considered to be recovered only when the estimated block indices are identical to those in the oracle block structure . The CGC algorithm is noted to recover underlying block structure perfectly for average intrablock correlation being or higher regardless the block size considered. Whereas, when the average intrablock correlation drops lower than , the accuracy of the block recovery drops sharply owing to decreasing gap between intra and interblock correlations. One might wonder such high correlations may not exist in real (speech) data created dictionary, it is already shown in Fig. 3 that do.
VB Reconstruction performance
We now evaluate the CGCBKSVD dictionary in terms of reconstruction performances under varying condition while contrasting with existing OMPSACBKSVD dictionary. The reconstruction performance for the synthetic data matrix over a learned dictionary is computed through sparse coding with a block sparsity of and defined as .
For generating the data for dictionary learning, we have first created a atom dictionary using the procedure outlined in VA using synthetic data. For this dictionary, the block size is kept as and the average intrablock correlation is kept as . From so created dictionary having known block structure, weighted sum of randomly selected blocks is computed to generate a synthetic data vector. Following this scheme data samples are derived for experimentation. For assessing the robustness of the dictionary learning approaches, the noisy versions of synthetic data are also created by adding white Gaussian noise at different signaltonoise ratios (SNRs).
For learning both kinds OMPSAC and CGCbased blockstructured dictionaries, the same KSVD learned dictionary is used as initialization. Unlike the former which iteratively updated both dictionary and the block structure, in the later only dictionary is updated iteratively while keeping the block structure estimated in first iteration fixed. Therefore, we studied the effect of iterations involved in dictionary learning in two cases. The reconstruction errors for this study are tabulated in Fig. 6(a). The CGC approach not only converges faster but also outperforms the contrast.
Fig. 6(b) shows the impact of addition of noise in the dictionary learning data for OMPSAC and CGCbased block dictionaries. The table lists the reconstruction performance for the noiseless data with respect to dictionaries learned using noisy data. Note that, the reconstruction error for the CGC case at dB SNR matches with that for the OMPSAC case under noiseless data. Thus, the CGC approach also maintains the edge over the OMPSAC approach under noisy data.
In the CGCbased approach, the initial dictionary is clustered based on the pairwise correlation among its atoms. Thus, we hypothesize that the tuning of the block size during the dictionary learning is less critical than that in the OMPSACbased approach. Fig. 6(c) lists the reconstruction errors for varying block size employed in dictionary learning and these results support our hypothesis.
In the earlier discussed synthetic data generation process, the variability of generated data depends on the number of blocks selected from the oracle block dictionary. For assessing the modeling ability of the proposed approach, different variability data sets are created by varying the number of blocks employed while generating the synthetic data from to . Separate dictionaries are learned using both the approaches on those data sets and the corresponding reconstruction errors are tabulated in Fig. 6(d). Even for higher variability data the CGCbased approach is noted to yield better reconstruction performance than that of OMPSACbased approach.
Vi Evaluation of Classification Performance
In this section, we evaluate the impact of the proposed innovations in the context of speaker verification (SV) task. The different SV systems developed in this work are evaluated on telephone condition test data sets in the NIST 2012 SRE [24]. Different sparse representation based SV (SRSV) systems using unsupervised (KSVD, (OMP/LARS)SACBKSVD) and supervised (IBCS and BKSVD) learned dictionaries are developed. The ivector [25] Gaussian probabilistic linear discriminant analysis (GPLDA) [26] based SV system is also created for primary contrast.
Via Experimental Setup
The setup employed for the real data experiments is identical to our earlier work [9]. In the following, we briefly mention the essential detail only. For more detailed description about database, performance measure, and signal processing the reader is referred to [9]. The SRE12 speech data set contains a total of speakers (female and male). The test set contains telephone recorded speech utterances, from which about million verification trials are created. The test data is partitioned into three subsets based on the environment and noise conditions. For the development of SV system, the speaker’s utterances from NIST SRE06, SRE08, and SRE10 data sets have been used. From the development data, a total of female and male data ivectors are derived. The speech data is analyzed to compute commonly used dimensional mel frequency cepstral coefficients. These are then augmented with their delta and double delta coefficients, thus resulting in a dimensional final feature vectors. A Gaussian mixture based universal background model (UBM) [27] is used in genderdependent modeling. For evaluating the performance of different SV systems developed, the BOSARIS toolkit[28] has been used. The performance of the SV systems are measured using the detection cost function as per NIST protocol [24]. It is defined as the mean of the normalized detection costs corresponding to the probability of target being set as and respectively. For being evaluated at low false alarm rates, the measure suits for high security applications. The cosine distance scoring (CDS) measure is used to find the scores for the SRSV methods.
For developing the SV system, the telephone recorded development data is partitioned into two parts: train and test. The initial system for tuning the parameters is trained on the devtrain dataset and then evaluated on the devtest dataset. A total of devtrials are created from devtest data containing female and male speakers. All SV systems explored in this work are modeled in genderdependent manner.
ViA1 Factor Analysis based Modeling
Genderdependent UBMs are learned using the telephone speech data derived from female and male speakers. The utterances in the development data are redistributed to have an average duration of seconds after voice activity detector for being of varying duration. In the ivector based contrast SV system, dimensional representational vectors are derived using the total variability matrix (Tmatrix) learned on the telephone recorded data. In GPLDA modeling, dimensional speaker subspaces are used and it includes whitening, length normalization followed by projection into the unit sphere. The GPLDA parameters are learned using pooled telephone and microphone recorded development data ivectors. In the SRSV systems, a genderdependent joint factor analysis (JFA) [29]
is employed for session/channel compensation of the Gaussian mixture model (GMM)UBM mean supervectors. Following that
dimensional speaker factors are derived and further details of the same are available in our earlier work [30].SV System  Block Structure  System  Detection Cost ()  %EER  
Dictionary type  Feature/Classifier 
Size  Class Sup.  Updation  Code  TC2  TC4  TC5  Avg.  Avg.  
Contrast 
Tmatrix  ivector/Bayes        S0  0.411  0.543  0.446  0.467  5.22 
JFAmatrix  spkfactor/CDS        S1  0.523  0.634  0.552  0.570  5.72  
KSVD  sparsevector/CDS        S2  0.494  0.605  0.516  0.538  9.94  
OMPSACBKSVD  variable  no  adapted  S3  0.447  0.527  0.427  0.467  14.40  
LARSSACBKSVD  variable  no  adapted  S4  0.428  0.508  0.430  0.455  11.31  
IBCS  variable  yes  unadapted  S5  0.450  0.475  0.399  0.441  17.72  
Sup. blockBKSVD  variable  yes  unadapted  S6  0.438  0.497  0.408  0.447  13.22  
Proposed 
CGCBKSVD  sparsevector/CDS  fixed  no  adapted  S7  0.424  0.511  0.410  0.448  12.23 
variable  no  adapted  S8  0.422  0.502  0.403  0.442  12.25  
Sup. CGCBKSVD  variable  yes  adapted  S9  0.386  0.443  0.366  0.398  13.51  
Fusion of systems  S0+S5  0.362  0.434  0.364  0.387  4.34  
S0+S9  0.313  0.397  0.336  0.349  4.14 
ViA2 Sparse Representation based Modeling
The genderdependent KSVD dictionaries are randomly initialized by selecting development data corresponding to and utterances for female and male cases, respectively. For learning KSVD dictionaries, iterations are performed. The unsupervised BKSVD dictionaries are initialized with corresponding KSVD dictionaries. Whereas, each of the supervised dictionaries is initialized with classspecific KSVD learned subdictionaries having a maximum of atoms per class (speaker). All kinds of dictionaries are trained on the speaker factors pooled from both the telephone and the microphone development data. Learning of the IBCS and SACBKSVD based dictionaries usually require iterations, while the CGCBKSVD and supervised CGCBKSVD based dictionaries is noted to converge in iterations. All the blockstructured dictionary are learned keeping the block size and block sparsity of and , respectively. In all SRSV systems, the sparse coding of the enrollment and test data is done using the BOMP algorithm. The coding over the unsupervised and the supervised dictionaries employ the sparsity value of and , respectively.
ViB Results and Discussions
In this subsection, first the performances of the proposed block structuring approach and classsupervised blockstructured dictionary based SV systems are discussed. Following that, the robustness of proposed SRSV system is evaluated. Finally, the results of the fusion of the proposed SR based and the ivector based SV systems are presented.
ViB1 SV performance evaluation
The system performances are primarily evaluated in terms of and the corresponding equal error rates (EERs) are reported only for reference purposes. The three conditions in the NIST SRE12 telephone test data sets are referred to as TC2, TC4 and TC5. The performances of different proposed and contrast SV systems are presented in Table I. On comparing between the ivector GPLDA (S0 system) and the KSVD dictionary (S2 system) based SV approaches, we note that the former significantly outperforms the later. As the KSVD dictionary is learned without any supervision or block structure, a direct comparison between the S0 and S2 systems may not be fair. For this purpose, we also did CDS directly on speaker factors and the resulting SV approach (S1 system) is found to be inferior to the S2 system in term of . In an earlier work, it is shown that the blockstructured KSVD dictionary outperforms the simple KSVD dictionary in the context of SRSV [16]. The remaining performances given in Table I are discussed next in the context of two enhancement proposed for learning the blockstructured dictionary.
Modified block structuring approach:
In Section IIA, it is shown that LARSSACbased dictionary has reduced interblock correlations, thus it is expected to yield improved SRSV performance. From Table I, we note that the LARSSACBKSVD dictionary (S4 system) results in a relative improvement of in and in EER over the OMPSACBKSVD dictionary (S3 system). With fixed size block formation, the CGCBKSVD dictionary (S7 system) is noted to yield in a relative improvement of and in when compared with the S3 and S4 systems, respectively. Further, on allowing variable block sizes in the CGCBKSVD dictionary (S8 system) an additional relative improvement of in is obtained.
Supervision in the block formation:
In Table I, the S5 system refers to the evaluation of recently proposed IBCS dictionary learning approach in SV task. In IBCS dictionary learning, a classsupervised block structure is employed and the intrablock coherences are minimized using gradient approach without updating the block structure. In contrast to that all previously discussed block dictionaries are learned using SVD along with updating the block structure. Therefore, for direct contrast with the IBCS dictionary, it would be interesting to explore the impact of employing the supervised block structure in the SVD based block dictionary learning. For this purpose, we have learned a block dictionary using BKSVD algorithm but with a classsupervised block structure which is not updated during learning. The resulting block dictionary based SV system is referred to as S6 system. It can be seen from Table I that both S5 and S6 systems yield similar SV performances. These results demonstrate the impact of class supervision in block structure in the dictionary learning. The second proposal of combining class supervision in the CGC approach for deriving the blockstructured dictionary is referred to as S9 system. It can be noted that the S9 system has consistently outperformed previously discussed system on all three test sets in the primary measure .
ViB2 Sensitivity to maximum block size
In the previous section, the SV performances of all blockstructured dictionary based systems correspond to maximum block size of . We have explored varying the same from  in the context of three systems (S3, S4, and S9) and the corresponding test condition averaged detection costs are given in Fig. 7. For the chosen block size range, the relative deviations in averaged have turned out to be , and for S3, S4 and S9 systems, respectively. From these results, it can be inferred that the supervised CGCBKSVD approach is more robust to variation in the block size. For greedy selection being employed in the CGC algorithm and block update using SVD, after a few iterations intraclass atoms become nearly uncorrelated irrespective of the constraint of the block size. This could be the possible reason behind the low sensitivity being exhibited by the proposed approach.
ViB3 Exploiting system diversity
The various SV systems explored in this work mainly differ in terms of the criterion employed in dictionary learning and scoring. More specifically, the ivector based SV approach involves factor analysis for learning the Tmatrix and the GPLDA for scoring. Whereas the SRSV approaches involve cluster wise eigendecomposition for learning the dictionaries while use the CDS for scoring. To highlight the complementary behavior of these approaches, the DET curves for a few salient systems are plotted in Fig. 8
. For exploiting the diversity, the logistic regression based scorelevel fusion of the ivector system (S0) with two best SRSV systems (S5 and S9) are explored and also shown in Table
I. The best performing fusion (S0+S9) is noted to provide a relative improvements of and in terms of and EER, respectively, when compared with the individual best performances.Vii Conclusion
In this paper, a novel correlation based block formation approach, referred to as the CGC, is presented for learning blockstructured dictionary. The CGCbased block dictionary yields improved reconstruction and classification performances. In contrast to existing SACbased approach, the proposed one is noted to exhibit faster convergence, lower sensitive to block size and more robust to additive noise while learning the dictionary. For further enhancement in classification ability, the class information is included in the CGC. The resulting blockstructured dictionary based SRSV system provides a relative improvement of in the detection cost over the best contrast SRSV system employing an existing supervised blockstructured dictionary. On fusing the best proposed SRSV and the stateoftheart ivector SV systems significant improvements in the classification performance are noted both in terms of the detection cost and the equal error rate.
References

[1]
M. Elad and M. Aharon, “Image denoising via learned dictionaries and sparse
representation,” in
Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, vol. 1, 2006, pp. 895–900.  [2] S. Gao, I. W.H. Tsang, and Y. Ma, “Learning categoryspecific dictionary and shared dictionary for finegrained image categorization,” IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 623–634, Feb. 2014.
 [3] Y.T. Chi, M. Ali, A. Rajwade, and J. Ho, “Block and group regularized sparse modeling for dictionary learning,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 377–382.
 [4] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, Feb. 2009.
 [5] V. M. Patel, T. Wu, S. Biswas, P. J. Phillips, and R. Chellappa, “Dictionarybased face recognition under variable lighting and pose,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 3, pp. 954–965, Jun. 2012.
 [6] Z. Dong, M. Pei, and Y. Jia, “Orthonormal dictionary learning and its application to face recognition,” Image and Vision Computing, vol. 51, pp. 13–21, Jul. 2016.
 [7] I. Naseem, R. Togneri, and M. Bennamoun, “Sparse representation for speaker identification,” in Proc. IEEE International Conference on Pattern Recognition (ICPR), 2010, pp. 4460–4463.
 [8] J. M. K. Kua, E. Ambikairajah, J. Epps, and R. Togneri, “Speaker verification using sparse representation classification,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 4548–4551.
 [9] Haris B.C. and R. Sinha, “Robust speaker verification with joint sparse coding over learned dictionaries,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 10, pp. 2143–2157, Oct. 2015.
 [10] M. Liu, X. Chen, and X. Wang, “Latent fingerprint enhancement via multiscale patch based sparse representation,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 1, pp. 6–15, Jan. 2015.
 [11] O. P. Singh, Haris B.C., and R. Sinha, “Language identification using sparse representation: A comparison between GMM supervector and ivector based approaches,” in Proc. Annual IEEE India Conference (INDICON), 2013, pp. 1–4.
 [12] M. Aharon, M. Elad, and A. Bruckstein, “KSVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–4322, Nov. 2006.
 [13] F. Rodriguez and G. Sapiro, “Sparse representations for image classification: Learning discriminative and reconstructive nonparametric dictionaries,” University of Minnesota, IMA Preprint 2213, Tech. Rep., Dec. 2007.
 [14] Z. Jiang, Z. Lin, and L. Davis, “Label consistent KSVD: Learning a discriminative dictionary for recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 11, pp. 2651–2664, Nov. 2013.
 [15] L. ZelnikManor, K. Rosenblum, and Y. C. Eldar, “Dictionary optimization for blocksparse representations,” IEEE Transactions on Signal Processing, vol. 60, no. 5, pp. 2386–2395, May. 2012.
 [16] G. Sreeram, Haris B.C., and R. Sinha, “Improved speaker verification using block sparse coding over joint speakerchannel learned dictionary,” in Proc. IEEE Region 10 Conference (TENCON), 2015, pp. 1–5.
 [17] M. Stojnic, F. Parvaresh, and B. Hassibi, “On the reconstruction of blocksparse signals with an optimal number of measurements,” arXiv preprint arXiv:0804.0041, 2008.
 [18] Y. C. Eldar and H. Bolcskei, “Blocksparsity: Coherence and efficient recovery,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2009, pp. 2885–2888.
 [19] Y. C. Eldar and M. Mishali, “Robust recovery of signals from a structured union of subspaces,” IEEE Transactions on Information Theory, vol. 55, no. 11, pp. 5302–5316, Nov. 2009.
 [20] Y. C. Eldar and H. Rauhut, “Average case analysis of multichannel sparse recovery using convex relaxation,” IEEE Transactions on Information Theory, vol. 56, no. 1, pp. 505–519, Jan. 2010.
 [21] R. Rubinstein, M. Zibulevsky, and M. Elad, “Efficient implementation of the KSVD algorithm using batch orthogonal matching pursuit,” Technion, TRCS200808, Tech. Rep., Apr. 2008.
 [22] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” The Annals of Statistics, vol. 32, no. 2, pp. 407–499, Apr. 2004.
 [23] Y. C. Eldar, P. Kuppinger, and H. Bolcskei, “Blocksparse signals: Uncertainty relations and efficient recovery,” IEEE Transactions on Signal Processing, vol. 58, no. 6, pp. 3042–3054, Jun. 2010.
 [24] The NIST Year 2012 Speaker Recognition Evaluation Plan, www.nist.gov/itl/iad/mig/upload/NIST SRE12 evalplanv17r1.pdf.
 [25] N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Frontend factor analysis for speaker verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788–798, May. 2011.
 [26] D. GarciaRomero and C. Y. EspyWilson, “Analysis of ivector length normalization in speaker recognition systems,” in Proc. Interspeech, 2011, pp. 249–252.
 [27] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using adapted Gaussian mixture models,” Digital Signal Processing, vol. 10, pp. 19–41, Jan. 2000.
 [28] The BOSARIS toolkit, accessed on 10th Dec. 2013. [Online]. Available: www.sites.google.com/site/bosaristoolkit/
 [29] P. Kenny, G. Boulianne, P. Ouellet, and P. Dumouchel, “Speaker and session variability in GMMbased speaker verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1448–1460, May. 2007.
 [30] N. Kumar and R. Sinha, “Class specificity and commonality based discriminative dictionary for speaker verification,” in Proc. National Conference on Communication (NCC), 2016, pp. 1–6.
Comments
There are no comments yet.