1 Introduction
Sparse representations have been proven to be very successful at restoration and reconstruction tasks such as compression, denoising, deblurring, inpainting and superresolution Elad et al. (2010). In essence, they aim at modeling the data/signal through concise linear combinations attained from an overcomplete basis or set of elements. This overcomplete set of elements is named as the dictionary and it can either be carefully fixed (experimentally or analytically) or be adapted to the data at hand through learning Tosic and Frossard (2011). Conventional nonconvex optimization of dictionary learning for sparse representations is given in Eqn. (1) as follows,
(1) 
where the matrix is the designated overcomplete dictionary and
is the sparse representation vector of the data point
. While minimizing the reconstruction error of over the dictionary , each sparse vector can have a maximum number of nonzero components due to the strict norm constraint. In literature, there exist approximate iterative solutions (namely, sparse coding and dictionary update) to this highly nonconvex problem and its variants Gribonval et al. (2015).In addition to reconstructive signal processing tasks, dictionary learning can also be employed in machine learning problems such as classification and clustering Akhtar et al. (2016); Oktar and Turkan (2018, 2019). At this point, it is proper to introduce oneclass classification, as the fundamental form of the general classification problem, to bridge the gap between reconstructive signal processing and machine learning. Supervised machine learning in the form of classification inherently suggests the existence of more than one label. The concept of oneclass learning, also known as oneclass or unitary classification, emerges when there only exists a single label within the dataset, and one needs to discriminate it against all possible unseen labels Moya and Hush (1996). It is actually a special case of binary classification where there is the “inclass” label and also the “outofclass”, but there is not any or enough number of “outofclass” samples within the training dataset. Therefore, in the absence or weakness of the opposing class samples, conventional binary classification methods will have difficulties as they target the decision boundary inbetween.
Oneclass learning methods can be categorized by the type of the targeted classifier model. There exist decisionboundary approaches which seek enclosing hyperspheres, hyperplanes or hypersurfaces in general
Khan and Madden (2014). These methods can adjust the levelofdetail through the usage of parametrized kernels to cope with the over or underfitting problem. On the other hand, graphbased methods try to fit a skeleton within data in a bottomup manner. As an example, a minimum spanning tree model can be utilized as a oneclass classifier Juszczak et al. (2009), in which the classification procedure relies on the distance to the tree. A generalization of graphbased approaches is attained through the concept of hypergraph, in which a hyperedge can now connect more than two data points or vertices. Hypergraph models not only allow custom but also lead the way to heterogeneous dimensionality. Such models are investigated in Wei et al. (2003); Silva and Willett (2008). As detailed in Sec. 2, simplicial learning through an extension of dictionary learning can be thought as the utmost generalization of the graphbased domain, in which vertices of a hypergraph can now move freely in space, taking the form of a simplicial.By definition, an innerskeleton method seeks a low and possibly heterogeneous dimensional piecewise linear model that expresses the data well in a compact manner. Most importantly, the dictionary learning concept can be categorized as an innerskeleton method. However, the skeleton attained is not bounded in space but rather an infinite one, where each infinite linear bone is connected to all others at the origin. Technically speaking, a bone corresponds to a linear subspace of arbitrary dimensions. This conception will be indeed helpful when dictionary learning is considered within a multiclass classification framework. In its traditional multiclass formulations, the sparse representation based classifier models a separate dictionary for each distinct class through a data fidelity term together with an norm regularization constraint on sparse codes ( or in general). Later, the test data is encoded sparsely and classified accordingly favoring the most reconstructive or representative dictionary Wei et al. (2013). In the absence of other modifications, this form of sparse representation based classifier is known to be generativeonly. The generative type approaches can create natural random instances of a class, in contrast to discriminativeonly methods which focus on decision boundaries between classes.
In a simplistic manner, one can draw parallels between innerskeleton and generative formulations which discard the existence of other classes; on the other hand, also between decisionboundary and discriminative approaches which need the existence of opposing classes. Not surprisingly, a method can be both generative and discriminative at the same time. Discrimination, in this sense, rises from the fact that while learning a dictionary (or a model) for a class, the data points from other classes are also taken into consideration, i.e., distance to those other points are to be maximized. Some examples of discriminative dictionary learning methods can be given as Mairal et al. (2009); Jiang et al. (2013).
There is a subtle but crucial point that goes unnoticed in sparse representation based classifier applications and this forms the backbone of the proposed study in this paper. Corresponding to this upcoming point, XOr problem of neural networks dictates that a single layer perceptron is not capable of separating XOr inputs as only a single linear decision boundary is at hand. This has paved way to multilayer formulations that can solve linearly nonseparable cases. A similar problem haunts dictionary learning methods silently. Consider the case as demonstrated in Fig.
1, in which there are two classes of digit . “Pale class” includes pale images, while “Bright class” contains exactly the same images but they are brightened up. In technical terms, there are two opposing classes lying on the same subspace in the eyes of linear dictionary learning methods. No matter how much discriminative they are, traditional techniques will be incapable of totally distinguishing these two classes. In other words, dictionary learning in its conventional form is insensitive to intensity/magnitude and it will never be able to solve problems requiring intensity/magnitude distinction.This study proposes a new dictionary learning framework for sparse representations through simplicials. While adapting conventional optimization constraints on sparse codes, the developed evolutionary simplicial learning algorithm leads to a strong generative approach. Experimental validation on different classification tasks demonstrates that this generativeonly structure can successfully distinguish two different classes lying on the same subspace as an advantage, while there exist some shortcomings when its discriminative power is under consideration. Achieving stateoftheart performance in most cases is highly possible through further modifications with discriminative elements. The remaining part of this paper is organized as follows. Sec. 2 introduces the basic concepts and mathematical foundations of simplicial learning as an extension to classical dictionary learning for sparse representations. Then, Sec. 3 details the proposed simplicial learning algorithm by adopting an evolutionary approach with the appropriate fitness function to the problem. Sec. 4 later reports experimental simulations over several datasets and illustrates the obtained results in different classification tasks. Finally, Sec. 5 briefly concludes this study together with possible considerations which can be adapted to strengthen both theoretical and application aspects of the proposed framework.
2 Simplicial Learning: An Extension of Dictionary Learning
2.1 Definitions
Dictionary learning optimization in Eqn. (1) basically tries to fit a union of subspaces to the data. Such subspaces are indeed infiniteextent and all crossing the origin without offsets, designated by the dictionary elements usually referred to as atoms. Simplicial learning as an adaptation of dictionary learning aims instead at fitting bounded generic piecewise linear objects to the data. Table 1 considers certain bounded generic piecewise linear objects. There are many notequivalent formal definitions of the first construct, namely a polytope to be discussed. This study strictly sticks with the definition that “a polytope is an intact object which admits a simplicial decomposition.” Hence, a polytope is made up of one or more simplices, whereas it is still in question that such simplices can be of different dimensions.
There are two possible ways to generalize the concept of polytope. In the first generalization, connectedness can be discarded leading to the fact that there is not a single object but multiple objects being considered at the same time. The second one allows the buildingblocks namely simplices to have different dimensions, thus leading to heterogeneously dimensional objects. A formal name for such union of simplices is a simplicial complex, but restricted selfintersections are imposed for a rigorous treatment. By definition, a simplicial complex is a set of simplices satisfying the following two conditions: (i) every face of a simplex from this set is also in this set and (ii) the nonempty intersection of any two simplices is a face of these two simplices. Losing a bit of formalism, utmost flexibility can be reached by allowing such objects to intersect each other and themselves in arbitrary ways, and such final construct is simply named as a simplicial in the remaining part of this paper, to refer to an arbitrary union of simplices in the most general sense. For a more rigorous treatment of these definitions and related concepts, readers might refer to Munkres (2018).






Polytope  ✗  ✓  ?  ✓  
Simplicial complex  ✓  ✓  ✓  ✗  
Simplicial  ✓  ✓  ✓  ✓ 
2.2 Related work
Simplex and simplicial complex based data applications are becoming popular in literature as data analysis receives more and more topological considerations Luo et al. (2017); Huang et al. (2015); Belton et al. (2018); Tasaki et al. (2016); Patania et al. (2017). Moreover, utilizing simplices for data applications is not a completely new idea from the perspective of sparse representations Wang et al. (2016); Nguyen et al. (2013). Quite similarly, in this study an adaptation of sparse representations framework is chosen that casts a union of subspaces to a union of simplices. A rigorous mathematical formulation is detailed in the following.
2.3 Mathematical formulation
There are three necessary modifications to make a successful transition from the traditional dictionary learning formulation to simplicial learning. First of all, an additional sumtoone constraint is needed on the sparse codes as noted in Eqn. (2) as follows,
(2) 
where denotes the column vector of ones, of appropriate size with the sparse vectors . Such modification casts dimensional subspaces into ()dimensional flats, a flat being a ()subspace with an arbitrary offset. A geometric explanation is illustrated in Fig. 2(ab) for the case when . In this example, a subspace solution (i.e., an infiniteextent plane) of sparse representations is indeed reduced into a flat (i.e., an infiniteextent line) with an additional sumtoone constraint on sparse codes.
In addition to above constraint, the second necessary modification is an additional nonnegativity on sparse codes as noted in Eqn. (3) as follows,
(3) 
where denotes the column vector of zeros, of appropriate size with the sparse vectors . Together with sumtoone constraint, sparse codes are now restricted to range in magnitude and thus represented flat as an infiniteextent line turns into a simplex (i.e., a bounded line, line segment) as apparent in Figure 2(bc) for . In the most generic sense, a simplex can be regarded as a bounded flat.
Note here that there is no any structural constraint on the sparse code patterns for the optimization problems in Eqns. (1)(3). In other words, all possible combinations of dictionary atoms are available for a sparse vector solution . Since most of these combinations are unnecessary for a given overcomplete dictionary, keeping a set of possible valid combinations (i.e., forcing certain patterns in sparse codes) will provide a more efficient and more compact representation. This finally leads to the concept of structured sparsity, or group sparsity in exact terms Yuan and Lin (2006); Jacob et al. (2009), as a last modification on the road to simplicial learning.
While referring back to Sec. 1, when positional information is removed from a simplicial, the structure left then corresponds to a hypergraph, in which a hyperedge refers to a specific simplex within the simplicial. In relation to group sparsity, a hyperedge exactly corresponds to a group of atoms, hence a valid pattern of sparse codes. As a consequence, a set of groups/hyperedges, or more technically a hypergraph data structure needs to be kept to define the shape of the simplicial. This hypergraph structure will be denoted as where designates the hyperedge referring to simplex within the simplicial. In accordance with this definition, simplicial learning with a structure imposed by can be formulated in Eqn. (4) as follows,
(4) 
where is the hyperedge indexing the closest simplex for the data point , denotes the dimension of that simplex, and the constraint ensures the group sparsity such that only the optimal group (i.e., hyperedge referring to the closest simplex) in is to be filled and other entries which are represented as shall all be zero. Note here that groups can be not only overlapping but also of different sizes, hence leading to heterogeneous dimensionality. In this final form, needs to be learned together with but a further careful consideration is needed over the compactness of the simplicial in return.
In summary, as is, the optimization in Eqn. (4) is highly illposed since there is no restriction on the number of simplices to be used or the dimensions of those simplices. One could even choose a very highdimensional simplicial construct and zeroout the approximation error easily. Therefore, additional penalty terms need to be investigated based on the number and the dimensionality of simplices for a compact solution. Such a challenge appears to be highly combinatorial in nature and an evolutionary approach can be adopted after a careful consideration of an appropriate fitness function, as described and detailed in Sec. 3.
3 Evolutionary Approach
To obtain an optimal or a suitable simplicial in a heuristic manner, certain number of simplicials are to compete against each other on instances of the same dataset. Basically, an evolutionary approach includes a suitable fitness function to guide this search process, and subprocedures such as
mutations and breeding to perform the actual search.3.1 The fitness function
There are certain critical points to be carefully considered before designating the fitness function for the defined problem in this study. First of all, a straightforward optimization procedure for the number and the dimensionality of simplices will not be enough to attain a compact model desired. For example, consider that the data is distributed in the shape of a triangle with certain area. In this case, a triangle with the most compact area should be preferred as a targeted model. However, one could fit a triangle to this data with correct angles but excessive area. In such a case the dimensionality or the number of simplices indeed do not change. In conclusion, one needs also to take the volume, or more technically the content of the simplicials, besides considering the number and the dimensionality of simplices.
The content (or volume) of an arbitrary simplex can be calculated using CayleyMenger determinant Li et al. (2015). Let be a dimensional simplex in , and denote distance matrix of vertices such that . Then the content of is given in a relation in Eqn. (5) as follows,
(5) 
where is matrix obtained from by bordering it with a top row of and a left column of .
Related with the content calculation here, another issue arises because of the allowed heterogeneous dimensionality in the optimization formula. The content of a linesegment (as an object) and a triangle (as an object) are incomparable in a general continuous setting since a triangle contains infinitelymany linesegments itself. To resolve this problem, an exponential term is introduced through an approximated cumulative discrete content calculation of a simplicial as given in Eqn. (6) as follows,
(6) 
where denotes the number of hyperedges or equivalently the number of simplices, is the content of the simplex and is the dimension of that simplex. As a content would complicate the exponentiation used, is needed in the discrete approximation.
Having pinned down the above term which will be a component in the fitness function driving the evolutionary process, a fitness function candidate (in a minimization form) is given in Eqn. (7) as follows,
(7) 
where sum of squared error (SSE) used as the data fidelity term and approximated cumulative discrete content as to regulate the compactness of the representation. denotes the regularization parameter controlling the contribution of the compactness prior on the solution.
While initially experimenting above fitness function, it is observed that the parameter has a very broad optimality range, which changes drastically from dataset to dataset. This is due to the fact that there is a high dynamic range imbalance between two cumulative terms. Therefore, a variant of the defined fitness function is considered by transforming Eqn. (7) into the logarithmic scale in order to compress the dynamic range, leading to a more natural maximization setting formulated in Eqn. (8) as follows,
(8) 
where denotes the number of data points and the parameter regulates over or underfitting. When , the fitness function simply reduces to the data fidelity term favoring only for the reconstruction quality. Instead, a high value forces the simplicial to be compact. Empirical investigations suggest that a value around could be a global setting as it provides excellent results over all datasets considered in this study. The parameter is fixed to .
3.2 Mutations and breeding
First of all, it is important to note here that the hypergraph is kept in the form of an incidence matrix of zeros and ones, where the row count corresponds to the number of simplices and the column count matches to the number of vertices or rather the number of atoms (columns) in the dictionary . Mutations can be easily applied on this binary matrix. In detail, there are four main processes that provide the background for evolution: (i) increasing/decreasing the dimension of a simplex, (ii) adding/removing a simplex, (iii) subdividing a simplex and (iv) adding/removing a vertex. All of these mutation operations are performed randomly without any optimality consideration.
As an additional tool to assist the searching process, breeding of two simplicials is also undertaken in which both dictionary elements and hypergraph structures of those two simplicials are split and then merged appropriately in order to create a new simplicial representative of two parents up to certain extent. Details of the breeding procedure are depicted in Alg. 1. At first, hypergraph structures and the corresponding dictionary elements are extracted for these two simplicials and . Then random submatrices and from each hypergraph are attained together with the corresponding columns of these dictionaries, contained in matrices and . While vertices (atoms) are directly concatenated in (line ), hypergraphs are concatenated in a disjoint manner in (line ). In short, two subsimplicials are extracted and then grouped together in a disjoint manner to form a new simplicial . Such tool can be suitably employed to exploit the underlying dimensionality of the dataset since these splitting and merging processes may lead child simplicials to acquire a properly representative datadimensionality in a very fast manner, much faster than mutation processes to perform alone. Therefore, as a general observation, breeding determines the core dimensionality of the simplicial and mutations finetune the simplicial to the data.
3.3 Implementation details
The algorithm to learn an evolutionary simplicial model on a set of data points stored in the columns of a data matrix is given in Alg. 2. At first, the initial simplicial is to be generated from the given data points (line ). It is observed that choosing a single point (i.e., centroid of the dataset) as an initial simplicial is sufficient for lowdimensional problems. Through mutations and breeding processes, the initial simplicial takes an appropriate form in a fast manner since the search space is relatively small. However, a procedure involving the means algorithm Jain (2010) as a subroutine is employed to designate the initial simplicial for highdimensional problems. In such cases, starting from a single point greatly slows down the process of evolution since the search space is quite large. Hence, an initialization based on means ensures that the starting simplicial is already a relatively fit one. A last point worth mentioning related to initialization here is that the initial simplicial should satisfy the condition that the numerator of Eqn. (8) is positive, i.e., to lead a meaningful evolution.
On line , the algorithm performs the projection of data points in onto each simplex of the simplicial Duchi et al. (2008); Golubitsky et al. (2012) which basically corresponds to the sparse coding optimization. The closest simplex for the data point is determined through the minimum approximation error acquired after projecting onto each simplex. The positive barycentric coordinates of the projection points corresponding to the sparse codes are acquired, and then the necessary spots of the sparse representation matrix is filled accordingly.
On line , dictionary matrix is updated using a direct leastsquares solution. To optimize by forcing its derivative to zero, the analytic solution is obtained with where represents MoorePenrose pseudoinverse of X. Note that there is no evolutionary process for learning , namely the vertices of the simplicial . Instead, vertices are updated once exactly on this line at each iteration of the algorithm.
Finally, the surviving simplicials are determined based on the fitness scores they attain (line ). Experimental trials suggest that keeping the population size at is an efficient strategy, while an iteration count of is sufficient instead of a full convergence. Notice here that the parent simplicials are to be kept in the population pool when their fitness scores are higher than their children’s.
4 Experimental Results
The proposed method is tested in two phases of experiments to evaluate its classification capabilities. In the first experimental setup, the performance is evaluated in a oneclass classification task for outlier detection. Datasets contain certain degree of outliers in such outlier detection problems, and methods learn models –agnostic of data labels– in an unsupervised manner. In the second classification task, the performance of the proposed method is evaluated in a multiclass setting. At this stage, seven synthetic multiclass datasets are generated in addition to two handwritten digit recognition datasets. The synthetic datasets are special in that they contain cases which require intensity/magnitude distinction, especially very challenging for conventional dictionary learning methods.
Dataset  #Samples  #Dimensions  Outlier Ratio (%) 

arrhythmia  
cardio  
glass  
ionosphere  
letter  
lympho  
mnist  
musk  
optdigits  
pendigits  
pima  
satellite  
satimage2  
shuttle  
vertebral  
vowels  
wbc 
4.1 Outlier detection
In total
benchmark datasets are taken from ODDS Library
Rayana (2016) for the oneclass learning task. Information regarding these datasets in terms of number of samples, sample dimensionality and outlier percentages is summarized in Table LABEL:info_out and interested readers might refer to Rayana (2016) for details about each individual dataset. Using these benchmark datasets, a random to traintest set split is repeated for independent simulations and the mean Area Under The Curve (AUC) Receiver Operating Characteristics (ROC) results are reported in Table LABEL:out_res.The proposed Evolutionary Simplicial Learning (ESL) method is evaluated against an extensive outlier detection benchmark named as PyOD Zhao et al. (2019). The competing methods include Anglebased Outlier Detector (ABOD) Kriegel et al. (2008), Clusteringbased Local Outlier Factor (CBLOF) He et al. (2003), Feature Bagging (FB) Lazarevic and Kumar (2005), Histogrambased Outlier Score (HBOS) Goldstein and Dengel (2012), Isolation Forest (IForest) Liu et al. (2008)
, K Nearest Neighbors (KNN)
Ramaswamy et al. (2000), Local Outlier Factor (LOF) Breunig et al. (2000), Minimum Covariance Determinant (MCD) Hardin and Rocke (2004), Oneclass Support Vector Machine (OCSVM)
Scholkopf et al. (2001)and Principal Component Analysis (PCA)
Shyu et al. (2003) and one of the most recent results obtained in Weng et al. (2018) on the same benchmark (with an average of runs for each dataset).Dataset  ABOD  CBLOF  FB  HBOS  IForest  KNN  LOF  MCD  OCSVM  PCA  Weng et al. (2018)  ESL 

arrhythmia  
cardio  
glass    
ionosphere  
letter    
lympho  
mnist  
musk  
optdigits    
pendigits  
pima    
satellite  
satimage2  
shuttle  
vertebral  
vowels    
wbc    
MEAN  n/a  
STDEV  n/a 
Last two rows of Table LABEL:out_res
illustrate the mean AUC ROC results over all datasets and their standard deviations. ESL not only presents the best average AUC ROC performance among all methods in the benchmark but also has the least standard deviation. One can conclude that it is the most reliable method among considered techniques for this performance measure. Moreover, ESL shows top AUC ROC performance in three datasets. However, additional tests show that it does not have a noticeable advantage in Precision at n (P@n) performance.
4.2 Multiclass classification
For the multiclass classification task, six challenging synthetic datasets are generated by following the procedures in 1 and these datasets are depicted in Fig. 3. Four of these datasets (namely, ClusterinCluster; TwoSpirals; HalfKernel and Crescent&Fullmoon) contain binary classification tasks while the remaining two of them (Corners and Outliers) consist of fourclass classification problems. In addition, a synthetically altered dataset (named as MNIST) is included in the experimental setup, in which all samples of the digit from the original MNIST LeCun et al. (2010) are designated as the “Bright class” while a new “Pale class” is generated from all these original samples by dimming with a scale of according to the previous discussion related to Fig. 1.
Dataset  SRC  LCKSVD1  LCKSVD2  DLSI  FDDL  DLCOPAR  LRSDL  ESL 

ClusterinCluster  
TwoSpirals  
HalfKernel  
Crescent&Fullmoon  
Corners  
Outliers  
MNIST8 
The proposed ESL algorithm in this setup is compared against Sparse Representationbased Classification (SRC) Wright et al. (2008), Label Consistent KSVD (LCKSVD1 and LCKSVD2) Jiang et al. (2013), Dictionary Learning with Structured Incoherence (DLSI) Ramirez et al. (2010), Fisher Discrimination Dictionary Learning (FDDL) Yang et al. (2011), Dictionary Learning for Commonality and Particularity (DLCOPAR) Kong and Wang (2012) and Lowrank Shared Dictionary Learning (LRSDL) Vu and Monga (2016, 2017). Experimental results in terms of classification success rates are presented in Table LABEL:class_res_syn. It is apparent that ESL easily outperforms all considered dictionary learning methods over all cases. This should not be a surprising result since all utilized synthetic datasets require intensity/magnitude distinction to various extents. On the other hand, some discriminative methods such as LCKSVD2, FDDL and LRSDL undergo meaningful learning (i.e., better than random) over some datasets. This observation leads to an important conclusion that discriminative modifications may alleviate insensitivity to intensity to a certain degree.
Fig. 3 depicts examples of learned simplicial models on six synthetic datasets. As it can be observed clearly, simplicials are bounded and they are composed of simplices (i.e., points and linesegments in these cases) with arbitrary offsets, providing an advantage over unbounded and withoutoffset dictionary learning models in all these classification tasks.
Generativeonly Discriminative  

Dataset  SDLG  TDDLG  LLC  LDL  ESL  KNN  SVMGauss  SDLD  FDDL  TDDLD 
USPS  
MNIST       
Digit Classification
: In most of the practical pattern recognition applications, the pattern or rather the direction of the feature vector utilized plays an important role on the success rate. For instance, a “star pattern” is a “star pattern” no matter how much bright or pale it is. Therefore, the advantage of simplicial learning over dictionary learning is expected to diminish in some realworld applications. This is observable in digit classification experiments featuring USPS
Hull (1994) and MNIST datasets as reported in Table 5. In this set of experiments, ESL is compared to classification methods including Supervised Dictionary Learning Mairal et al. (2009) with generative training (SDLG) and with discriminative learning (SDLD), Taskdriven Dictionary Learning Mairal et al. (2011): unsupervised (TDDLG) and supervised (TDDLD), FDDL, KNN, Gaussian SVM, Localityconstrained Linear Coding (LLC) Wang et al. (2010) and Localitysensitive Dictionary Learning (LDL) Wei et al. (2013). LLC and LDL methods have the sumtoone constraint on sparse codes, therefore they learn spaces with arbitrary offsets but learned models are still not bounded (without the nonnegativity constraint).As apparent from Table 5, ESL appears to be a successful generativeonly method which performs nearly at the capacity of Gaussian SVM (i.e., a wellknown and widely used discriminative classifier). However, it cannot outperform discriminative dictionary learning methods such as FDDL and TDDLD in these datasets. A final note is that ESL can also be modified through discriminative elements. Discriminative methods SDLD and TDDLD have a
advantage over their generative counterparts SDLG and TDDLG. Hence, a successful discriminative version of ESL can then be projected to reach stateoftheart, an estimation open to discussion or further investigation.
5 Discussion and Conclusion
Dictionary learning through simplicials is more flexible than classical dictionary learning models since simplices are bounded and freely positioned in space. The proposed sparsity based evolutionary structure, called ESL is highly applicable if the characteristics of the problem at hand requires such successful localized models. In this study, a global fitness function is employed and there is no restriction on the local fitness of each individual simplex within the simplicial. If the local fitness of each simplex is considered and optimized individually, the resulting simplicial model might be in a more compact form. For example, the unnecessary simplex of green simplicial in Fig. 3
(c) would most probably be eliminated as it does not have any local fitness, thus lead to an increased accuracy of classification. Another point worth mentioning here is that the employed fitness function in Eqn. (
8) is reminiscent of Poisson distribution, in a multidimensional form
Belyaev and Lumen’skii (1988). Hence, other probabilistic considerations and also discriminative elements can be adapted to strengthen both theoretical and application aspects of the proposed framework.As exemplified in this paper, simplicial learning can successfully address some weak points of conventional dictionary learning for the considered machine learning problems; it is a promising approach inherently capable of performing signal processing tasks and can become a general machine learning tool with many application domains.
References
 [1] 6 functions for generating artificial datasets  File Exchange  MATLAB Central. (en). Note: accessed 20191009 External Links: Link Cited by: §4.2.
 Discriminative bayesian dictionary learning for classification. IEEE Trans. Patt. Anal. Mach. Intell. 38 (12), pp. 2374–2388. Cited by: §1.
 Learning simplicial complexes from persistence diagrams. In Conf. Comput. Geometry, pp. 18. Cited by: §2.2.
 Multidimensional poisson walks. J. Soviet Math. 40 (2), pp. 162–165. Cited by: §5.
 LOF: identifying densitybased local outliers. In ACM SIGMOD Record, Vol. 29, pp. 93–104. Cited by: §4.1.
 Efficient projections onto the l 1ball for learning in high dimensions. In Int. Conf. Mach. Learn., pp. 272–279. Cited by: §3.3.
 On the role of sparse and redundant representations in image processing. Proc. IEEE 98 (6), pp. 972–982. Cited by: §1.

Histogrambased outlier score (hbos): a fast unsupervised anomaly detection algorithm
. KI2012: Poster and Demo Track, pp. 59–63. Cited by: §4.1.  An algorithm to compute the distance from a point to a simplex. Commun. Comput. Algebra 46, pp. 57–57. Cited by: §3.3.
 Sample complexity of dictionary learning and other matrix factorizations. IEEE Trans. Inf. Theory 61 (6), pp. 3469–3486. Cited by: §1.
 Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. Computational Stat. Data Anal. 44 (4), pp. 625–638. Cited by: §4.1.
 Discovering clusterbased local outliers. Pattern Recog. Lett. 24 (910), pp. 1641–1650. Cited by: §4.1.
 A new simplex sparse learning model to measure data similarity for clustering. In Int. Joint Conf. Artif. Intell., pp. 3569–3575. Cited by: §2.2.
 A database for handwritten text recognition research. IEEE Trans. Patt. Anal. Mach. Intell. 16 (5), pp. 550–554. Cited by: §4.2.
 Group lasso with overlap and graph lasso. In Int. Conf. Mach. Learn., pp. 433–440. Cited by: §2.3.

Data clustering: 50 years beyond Kmeans
. Pattern Recog. Lett. 31 (8), pp. 651–666. Cited by: §3.3.  Label consistent KSVD: learning a discriminative dictionary for recognition. IEEE Trans. on Patt. Anal. Mach. Intell. 35 (11), pp. 2651–2664. Cited by: §1, §4.2.
 Minimum spanning tree based oneclass classifier. Neurocomput. 72 (79), pp. 1859–1869. Cited by: §1.
 Oneclass classification: taxonomy of study and review of techniques. The Know. Eng. Review 29 (3), pp. 345–374. Cited by: §1.
 A dictionary learning approach for classification: separating the particularity and the commonality. In European Conf. Comp. Vis., pp. 186–199. Cited by: §4.2.

Anglebased outlier detection in highdimensional data
. In Int. Conf. Knowledge Discovery Data Mining, pp. 444–452. Cited by: §4.1.  Feature bagging for outlier detection. In Int. Conf. Knowledge Discovery Data Mining, pp. 157–166. Cited by: §4.1.
 MNIST Handwritten Digit Database. Cited by: §4.2.
 Simplex volume analysis for finding endmembers in hyperspectral imagery. In Satellite Data Comp. Commun. Process. XI, Vol. 9501, pp. 950107. Cited by: §3.1.
 Isolation forest. In IEEE Int. Conf. Data Mining, pp. 413–422. Cited by: §4.1.
 Learning discriminative activated simplices for action recognition. In AAAI Conf. Artif. Intell., pp. 4211–4217. Cited by: §2.2.
 Taskdriven dictionary learning. IEEE Trans. Patt. Anal. Mach. Intell. 34 (4), pp. 791–804. Cited by: §4.2.
 Supervised dictionary learning. In Adv. Neural Inf. Process. Syst., pp. 1033–1040. Cited by: §1, §4.2.
 Network constraints and multiobjective optimization for oneclass classification. Neural Networks 9 (3), pp. 463–474. Cited by: §1.
 Analysis on manifolds. CRC Press. Cited by: §2.1.
 Simplicial nonnegative matrix factorization. In Int. Conf. Comput. Commun. Tech.Res. Innov. Vis. Fut., pp. 47–52. Cited by: §2.2.
 A review of sparsitybased clustering methods. Signal Process. 148, pp. 20–30. Cited by: §1.
 Kpolytopes: a superproblem of kmeans. Signal, Image, Video Process. 13 (6), pp. 1207–1214. Cited by: §1.
 Topological analysis of data. EPJ Data Sci. 6 (1), pp. 7. Cited by: §2.2.
 Efficient algorithms for mining outliers from large data sets. In ACM SIGMOD Record, Vol. 29, pp. 427–438. Cited by: §4.1.
 Classification and clustering via dictionary learning with structured incoherence and shared features. In IEEE Conf. Comp. Vis. Patt. Recog., pp. 3501–3508. Cited by: §4.2.
 ODDS library. Stony Brook Univ., Dept. of Computer Sci.. External Links: Link Cited by: §4.1.
 Estimating the support of a highdimensional distribution. Neural Computation 13 (7), pp. 1443–1471. Cited by: §4.1.
 A novel anomaly detection scheme based on principal component classifier. In Int. Conf. Data Mining, Cited by: §4.1.
 Hypergraphbased anomaly detection of highdimensional cooccurrences. IEEE Trans. Patt. Anal. Mach. Intell. (3), pp. 563–569. Cited by: §1.
 Simplexbased dimension estimation of topological manifolds. In Int. Conf. Patt. Recog., pp. 3609–3614. Cited by: §2.2.
 Dictionary learning: what is the right representation for my signal?. IEEE Signal Process. Mag. 28, pp. 27–38. Cited by: §1.
 Learning a lowrank shared dictionary for object classification. In IEEE Int. Conf. Image Process., pp. 4428–4432. Cited by: §4.2.
 Fast lowrank shared dictionary learning for image classification. IEEE Trans. Image Process. 26 (11), pp. 5160–5175. Cited by: §4.2.
 Recognizing actions in 3D using actionsnippets and activated simplices. In AAAI Conf. Artif. Intell., pp. 3604–3610. Cited by: §2.2.
 Localityconstrained linear coding for image classification. In IEEE Conf. Comp. Vis. Patt. Recog., pp. 3360–3367. Cited by: §4.2.
 Localitysensitive dictionary learning for sparse representation based classification. Pattern Recog. 46 (5), pp. 1277–1287. Cited by: §1, §4.2.
 Hot: hypergraphbased outlier test for categorical data. In PacificAsia Conf. Know. Discov. Data Mining, pp. 399–410. Cited by: §1.
 Multiagentbased unsupervised detection of energy consumption anomalies on smart campus. IEEE Access 7, pp. 2169–2178. Cited by: §4.1, Table 3.

Robust face recognition via sparse representation
. IEEE Trans. Patt. Anal. Mach. Intell. 31 (2), pp. 210–227. Cited by: §4.2.  Fisher discrimination dictionary learning for sparse representation. In Int. Conf. Comp. Vis., pp. 543–550. Cited by: §4.2.
 Model selection and estimation in regression with grouped variables. J. Royal Stat. Soc. B 68 (1), pp. 49–67. Cited by: §2.3.
 PyOD: a python toolbox for scalable outlier detection. J. Mach. Learn. Res. 20, pp. 1–7. Cited by: §4.1.