## 1 Introduction

Recent years, image classification has been a classical issue in pattern recognition. With advancements in theory, many image classification methods have been proposed

yang2014sparse ; wright2009robust ; liu2019group ; zhang2011sparse ; seo2016covariant ; yang2009linear ; liu2014class ; liu2016face ; liu2017class ; castrodad2012sparse ; chang2015stacked ; aharon2006k ; zhang2010discriminative ; jiang2013label ; yu2012adaptive ; chan2015pcanet ; xu2019sparse ; wan2018rethinking ; tokozume2017between ; song2018euler ; he2016deep . In these methods, there is one category that contributes a lot for image classification which is the dictionary learning (DL) based method. DL is a generative model which the concept was firstly proposed by Mallat*et al.*mallat1993matching . A few years later, Olshausen

*et al.*olshausen1996emergence ; olshausen1997sparse proposed the application of DL on natural images and then it has been widely used in many fields such as image denoising li2018joint ; li2012efficient , image superresolution gao2018self ; wang2012semi and image classification fawzi2015dictionary ; sun2016learning . According to different ways of utilizing the discriminative information, DL methods can be split into two categories: i) class specific dictionary learning ii) class shared dictionary learning.

Class specific dictionary learning method utilises the discriminative information by adding discrimination ability into dictionary. The learned dictionary is for each class. This category can gain the representative feature information of a class. The feature information that most samples of the class have is focused on, while the feature information that only a few samples of the class have is igored to some extent. That is to say, the learned dictionary has higher weight on the feature information which samples close to the distribution center, and lower weight on the feature information that samples off the center. With this method, some abnormal sample points are igored so that the robustness of the learned dictionary can be improved. There are many classical class specific dictionary learning algorithms have been reported recent years such as wang2012supervised ; yang2014sparse ; liu2016face . However, the dictionary learned by this approach has a drawback: Due to the learned dictionary is for each class, the training samples of each class are mapped to a separate subspace. It leads to some redundancy in the base vectors among different subspaces. For example, in face datasets, the features of eyes are similar in different classes. In other words, we may obtain similar base vectors with different classes. During testing stage, it is hard to opt the base vector which belongs to the same class of the testing sample to fit the testing sample for eyes. Thus, despite this way can describe the training samples well, it is not conducive to representing the testing samples while the dictionaries of all classes are cascaded together.

For class shared dictionary learning method, the discriminative information is directy embedded into the objective function to learn a dictionary for all classes. With this method, the training samples from all classes are mapped into one subspace. Hence, the representative feature information of all classes can be adoped. However, it can not describe the samples in each specific class well. Moreover, most class shared dictionary learning methods use the -norm regularization term as the sparse constraint which leads to the NP-hardnatarajan1995sparse problem. Despite some greedy methods such as orthogonal matching pursuit(OMP) tropp2007signal can help solve this problem to some extent, it is usually to find the suboptimum sparse solution instead of the optimal sparse solution.

In comparison to class specific dictionary learning and class shared dictionary learning, it is clear that the two methods have complementary advantages. It can help to get significant boost in classification accuracy if the advantages of the two dictionary learning methods can be properly combined. In this paper, we first propose a novel class shared dictionary learning algorithm named label embedded dictionary learning (LEDL). This method introduces the -norm regularization term to replace the -norm regularization of LC-KSVD jiang2013label . Then we propose a novel network named hybrid dictionary learning network (HDLN) to combine a class specific dictionary learning method with a class shared dictionary learning method together.

Our network contains two layers. Specifically, the first layer is consisted of the class specific dictionary learning for sparse representation (CSDL-SRC) liu2016face met-hod, it is used to extract the crucial feature information of a class to wipe off singular points and improve robustness. The second layer is composed of LEDL which pulled the feature information belongs to different subspaces back into the same subspace to obtain the relationship among different classes. Figure 1 shows the variation of sample distribution. Figure 1 shows the random distribution samples belong to three classes; Figure 1 shows that the samples belongs to the same class are clustered while the samples of three classes are in different subspaces; Figure 1 shows that the samples in different subspaces are pulled back into the same subspace. A schematic description of our proposed HDLN is given in Figure 2. We adopt the alternating direction method of multipliers (ADMM) boyd2011distributed algorithm and blockwise coordinate descent (BCD) liu2014blockwise algorithm to optimize HDLN. The contributions of this work are four-fold:

1) We propose a novel class shared dictionary learning method named label embedded dictionary learning (LEDL) that introduces the -norm regularization term as the sparse constraint. The -norm sparse constraint can easily find the optimal sparse solution.

2) We propose a novel dictionary learning network named hybrid dictionary learning network (HDLN) that discriminative information is used in different ways to fully describe the feature while completely maintain the discriminative information. The HDLN can be considered as the extension of conventional dictionary learning algorithms.

3) We propose to utilize the alternating direction method of multipliers (ADMM) boyd2011distributed algorithm and blockwise coordinate descent (BCD) liu2014blockwise algorithm to optimize each layer of dictionary learning task.

4) The proposed LEDL and HDLN methods are evaluated on six benchmark datasets and verifies the superior performance of our methods.

The rest of this paper is organized as follows. Section 2 briefly reviews related work on CSDL-SRC and LEDL. Section 3 presents LEDL and HDLN methods for image classification. The optimization approach is elaborated in Section 4. Section 5 shows experimental results on six well-known datasets. And finally Section 6 is the conclusion.

## 2 Related work

In this section, we overview two related dictionary learning methods, including class specific dictionary learning for sparse representation (CSDL-SRC) and label consistent K-SVD (LC-KSVD).

### 2.1 Class specific dictionary learning for sparse representation (CSDL-SRC)

Liu *et al.* liu2016face proposed CSDL-SRC to reduce the high residual error and instability of SRC. The authors consider the weight of each sample feature when generating the dictionary. Assume that is the training sample matrix, where represents the dimensions of the sample features, and are the number of training samples and the class number of training samples, respectively. The class of training sample matrix is denoted as , where and is the class of (). Liu *et al.* build a weight coefficient matrix for , where is the dictionary size of CSDL-SRC and is the class of (). The objective function of CSDL-SRC is as follows:

(1) |

where is the sparse codes of , the -norm regularization term is utilized to enforce the sparsity, is the regularization parameter to control the tradeoff between fitting goodness and sparseness. The denote the column vector of matrix .

### 2.2 Label consistent K-SVD (LC-KSVD)

Jiang *et al.* jiang2013label proposed LC-KSVD to combine the discriminative sparse codes error with the reconstruction error and the classification error to form a unified objective function which is defined as follows:

(2) |

where is the sparsity constraint factor, is the dictionary matrix of , is the sparse codes matrix of .

is a classifier matrix learned from the given label matrix

. We hopecan return the most probable class this sample belongs to.

represents the discriminative sparse codes matrix andis a linear transformation matrix relys on

. and are the regularization parameters balancing the discriminative sparse codes errors and the classification contribution to the overall objective function, respectively.Here, CSDL-SRC is a class specific dictionary learning method, while LC-KSVD is a class shared dictionary method. The difference of the two methods is shown in Figure 3.

represents the zero matrix.

## 3 Proposed hybrid dictionary learning network (HDLN)

In this section, we elaborate the construction of hybrid dictionary learning network (HDLN). Specifically, in subsection 3.1, we introduce the first layer of the network which is composed of CSDL-SRC. In subsection 3.2, we propose LEDL and let it be the second layer of the network.

### 3.1 The first layer

Given a training sample matrix , then we set a suitable dictionary size , the objective function of the first layer is as follows:

(3) |

where and are the dictionary matrix and sparse codes matrix of the first layer in our proposed HDLN, respectively.

### 3.2 The second layer

We propose a novel class shared dictionary method nam-ed label embedded dictionary learning (LEDL) which introduces the -norm regularization term to replace the -norm regularization of LC-KSVD. And the second layer is consisted of LEDL. Based on the computation above, we explicitly construct a sparse codes matrix from the first layer and make it to be one of the input of the next layer. In addition, the label matrix and discriminative sparse codes matrix are also introduced to the second layer. After giving a reasonable dictionary size of LEDL, the objective function can be written as follows:

(4) |

where is the dictionary of , is the sparse codes of . The definitions of and in Equation 4 are same with the ones in Equation 2.

## 4 Optimization of the objective function

Due to the optimization issues about Equation 3 and Equation 4 are not jointly convex, Equation 3 is separately convex in either (with fixed) or (with fixed), and Equation 4 is separately convex in either (with , , fixed), (with , , fixed), (with , , fixed), or (with , , fixed). To this end, we cast the optimization problem as six subproblems which are -norm regularized least-squares(-) minimization subproblem for finding sparse codes(, ) and -norm constrained least-squares (-) minimization subproblem for learning bases(, , , ), respectively. Here, ADMM boyd2011distributed framework is introduced to solve the first subproblem while BCD liu2014blockwise method offers the key to addressing the other subproblems.

### 4.1 Optimization of the first layer

ADMM is usually used to solve the equality-constrained problem while the objective function of CSDL-SRC is unconstrained. Thus the core idea of imposing ADMM framework here is to introduce an auxiliary variable to reformulate the original function into a linear equality-constrained problem. By introducing the auxiliary variable , the in Equation 3 can be substituted by and , thus we can rewritten Equation 3 as follows:

(5) |

Then the lagrangian function of the problem (5) with fixed can be rewritten as:

(6) |

where is the augmented lagrangian multiplier and is the penalty parameter. We can gain the closed-form solution with respect to each iteration by follows:

(1) Updating while fixing , and :

(7) |

where is the iteration number and means the value of matrix after iteration, the closed form solution of is:

(8) |

the here can be written as:

(9) |

(11) |

the closed form solution of is:

(12) |

the here can be written as:

(13) |

the here can be written as:

(14) |

(3) Updating while fixing , and :

(15) |

Based on the above ADMM steps, we obtain the closed form solution of , and . Then we utilise BCD method with fixed , and to solve the constrained minimization problem of Equation 5. The objective function can be rewritten as follows:

(16) |

To this end, we can solve the closed-form solution with respect to the single column by follows:

(4) Updating while fixing , and :

(17) |

the closed form solution of is:

(18) |

the here can be written as:

(19) |

where , denote the row vector of matrix .

### 4.2 Optimization of the second layer

Similar to the above procedure, LEDL problem can be decomposed into two subproblems which are the same with the ones of CSDL-SRC that can be optimized by ADMM and BCD methods, respectively.

For finding sparse codes subproblem, we utilise AD-MM method to optimize the objective function, hence the Equation 4 with , , fixed can be written as follows:

(20) |

where the definitions and applications of , , and in Equation 20 are similar with the , , and in Equation 6. Thus, we can obtain the closed-form solution with respect to each iteration by follows:

(1) Updating while fixing , , , and , the closed-form solution of is:

(21) |

where

(22) |

(23) |

(2) Updating while fixing , , , and , the closed-form solution of is:

(24) |

where

(25) |

(26) |

(3) Updating while fixing , , , and , the closed-form solution of is:

(27) |

For learning bases subproblem, BCD method is used to optimize the objective function, thus the Equation 20 with , and fixed can be rewritten as follows:

(28) |

To this end, we can solve the closed-form solution with respect to the single column by follows:

(4) Updating while fixing , , , and , the closed-form solution of is:

(29) |

the here can be written as:

(30) |

where .

(5) Updating while fixing , , , and , the closed-form solution of is:

(31) |

the here can be rewritten as:

(32) |

where ;

(6) Updating while fixing , , , and , the closed-form solution of is:

(33) |

the here can be rewritten as:

(34) |

where .

### 4.3 Convergence analysis

The convergence of CSDL-SRC has been demonstrate in liu2016face .

Assume that the result of the objective function after iteration is defined as . Since the minimum point is obtained by ADMM and BCD methods, each method will monotonically decrease the corresponding objective function. Considering that the objective function is obviously bounded below and satisfies the Equation (35), it converges.

(35) |

### 4.4 Overall algorithm

The overall updating procedures of our proposed network is summarized in Algorithm 1. Here, is the maximum number of iterations, is a squre matrix with all elements 1 and indicates element dot product. In the algorithm 1, we first update the parameters of first layer to get the sparse codes and dictionary , then is treated as one of the inputs of second layer to obtain the corresponding bases , .