1 Introduction
Texture Characterization of Bone radiograph images (TCB) is a challenge in the osteoporosis diagnosis organized for the International Society for Biomedical Imaging (ISBI) 2014. The goal of this Challenge is to identify osteoporotic cases from healthy controls on 2D bone radiograph images, using texture analysis. The dataset consists of two populations composed of 87 control subjects (CT, Figure 1 (left)) and 87 patients with osteoporotic fractures (OP, Figure 1 (right)).
As illustrated by Figure 1, textured images from the bone microarchitecture of osteoporotic and healthy subjects are very similar, making the challenge’s task highly difficult .
2 Feature for textures
In our submissions to the ISBI challenge on texture classification, we have not looked for complicated application specific features or for a fancy feature selection algorithm. We rather focusd on two simple types of features, namely covariance matrices and wavelet marginals. Those submissions aimed at evaluation the features already studied to a reallife application.
2.1 Covariance matrices
Covariance matrices have been studied as image descriptor in wide variety of applications from licence plate detection [9] to pedestrian detection [16].
For an image or a region of an image , this approach consist in computing local features (usually statistical properties) on every pixel
. Then, for of those local descriptors, the unbiased empirical estimator of the covariance matrix is computed as :
(1) 
with and being the empirical mean of . Note that this estimator will be accurate provided that the number of samples is large enough compared to the number of features.
However, this estimator is wellknown for its sensitivity to outliers. To overcome this issue, a robust estimator Minimumum Covariance Determinant (MCD) has been introduced in
[11]. Basically, MCD aims at finding observations (out of ) whose covariance matrix has the lowest determinant.Even if it suffered recently some controversies about its convergence properties, we use the algorithm (FastMCD [12]) that has been proposed to approximate the MCD estimator. We use the implementation provided in the LIBRA toolbox ^{1}^{1}1Available at http://wis.kuleuven.be/stat/robust/LIBRA/LIBRAhome. for MATLAB [17]. In our experiment, when using the FastMCD algorithm, we set (meaning that the algorithm should be robust up to of outliers) and the number of trial subsamples drawn from the dataset.
Concerning the local features used for computing a covariance matrix, there exists several choices. We used two variants of features used in the litterature :

(2) where is the intensity of the pixel and , ,.. are the intensity derivatives (first and second order along the and axis) and the last term is the edge orientation ^{2}^{2}2Note that contrary to cited paper, we did not use the pixel coordinates as it did not make sense for texture analysis and gave poor results., leading to a covariance matrix.
As already noticed in the litterature [15, 9, 10, 16], covariance matrices belong to a nonEuclidean space where distances are not computed on straight lines but rather on curves lines (namely geodesics). Hence, using tools from the Riemannian geometry to analysis, it is possible to analyse those structured data. For example, given and two nonsingular covariance matrices, the Riemannian distance between them is :
(4) 
with the Frobenius norm and the matrix principal logarithm.
Recently, some authors investigated the use of such a feature for EEG signals and propose to use different kernels of the litterature to handle it [1, 18]. We intend to apply those study to covariance matrices computed on images.
In our experiments on textures (and coherently to the results in [18]), normalized LogEuclidean kernels showed the best performance in a Leaveoneout crossvalidation. For two nonsingular covariance matrices, this kernel is defined as :
(5) 
with the parameter being a nonsingular covariance matrix. This kernel can be understood as a normalized scalar product in a the Tangent space^{3}^{3}3It is a local linear approximation around of the Riemannian manifold. around (a Euclidean space where the data are mapped by the logarithm application  see Figure 2) to the space of positive definite matrices. As the property demonstrated in [18] suggest it, the choice of
induces a deformation of the shapes in the feature space. So far, two heuristics have been used, either the identity matrix
or the Riemannian mean [7] of the learning set .This deformation of the geometry induced by the use of a LogEuclidean kernel is illustrated in Figure 3. Hence, it seems reasonnable to use the Riemannian mean as it may reduce the amount of distortion induced by flattening the manifold.
2.2 Wavelet marginals
Wavelet marginals are signal and image descriptors based on wavelet decomposition. This feature has been developed in order to extract frequential information for translation invariant classification of biomedical signals [3] and textures [19].
Before delving into the description of this feature, let us briefly introduce some notations in the context of onedimensional signals^{4}^{4}4For a comprehensive review of wavelet decomposition, the reader should refer to [6].. Let be a mother wavelet (which shape is parametrized by ). We denote by the wavelet obtained from the mother wavelet after a dilatation at scale and a translation .
As originally described in [3], it is possible to extract the information contained in some frequency bands using wavelet marginals. For a signal , for every , this feature is defined as :
(6) 
For a given signal of size , it is possible to extract at most marginals. As illustrated in Figure 4, every marginal extract this information contained in a given frequency band of the signals.
Once the wavelet decomposition of a twodimensional image is defined, it is straightforward to extend marginals to images. Let be a scaling function and its corresponding mother wavelet. For the purpose of image analysis [6]
, three different 2D mother wavelets are generated from the tensor product of the wavelet and the scaling function. Then, with
and the translations along x and y axis respectively, we have the following wavelet coefficients for an image :(7) 
As proposed in [19], by summing over the extra indices ( and ) in Eq.6, it is possible to extend this feature to images.
In this work, marginals are used as a baseline. Indeed, this feature is very sensitive to the waveform used for analysis the signal. Some method has been proposed [19] in order to optimize this waveform and to select the relevant scale for classification.
For a given image, we decomposed it using a Haar wavelet^{5}^{5}5The wavelet decomposition was performed using Wavelab 850 toolbox for Matlab available at : http://statweb.stanford.edu/~wavelab
and then computed the marginals of the decomposition for every scale. Using the labeled dataset, we normalize the data to a zero mean and unit variance and then a linear kernel was used. Note that since unnormalized marginals are positive and sum to one, the use of
kernels[5] may be investigated.3 Methodology
3.1 Image preprocessing
For extracting wavelet marginals, we need the images to have dyadic dimensions. Hence, we resized (using the Matlab function imresize) the image from to , or .
Finally, based on our validation results, marginals of Haar wavelets seemed to be the most efficient on images.
On our first round of experiments, we applied gradient based covariance matrices to the raw images and obtained very low (almost random) validation results. The gradient based features being very localised, it seemed to miss some important information on the data. Hence, we resized the images by different factors in order to extract more global inforamtion. We rescaled the images to factor ,, or . On our validation procedure, a rescaling factor of (e.g. ) gave the best validation performance.
However, when computing our covariance matrices, we observed very unstable results. As this estimator is not robust to outliers, we applied the FastMCD algorithm to approximate the MCD estimator. For gradient based features, we obtained a boost in the validation results (compared to the empirical covariance matrices).
For the Gabor based covariance matrices, the results have been somehow very different. We observed that applying Gabor based covariance matrices to rescaled images was giving worst validation results. This may be because the Gabor filters used were already extracting global information on the raw images. The choices of the parameters of the Gabor filters considered have been choosen based on their validation performance.
Note that contrary to the gradient based features, we did not use the FastCMD algorithm and only relied on the empirical covariance matrices of the Gabor features. Indeed, the FastCMD approximation lead to poor validation results. However, this may only indicate that we should have bette tune the parameter of the FastCMD algorithm (and raising the number trial .
3.2 Validation procedure
feature  CmdMatgrad  CovMatgab  MarginalHaar 

image size  
kernel type  LogEuclidean  LogEuclidean  linear 
kernel parameter  identity  Riemannian mean   
validation  
LOO accuracy 
The rules of the competition specified that the competitors had to use an SVM classifier^{6}^{6}6We used the SVM toolbox for MATLAB  available at http://asi.insarouen.fr/enseignants/~arakoto/toolbox/index.html.
. We tuned the hyperparameter
of the classifier using a Leaveoneout (LOO) crossvalidation procedure. This parameter could take values in the set .When we produced two variants of the same methods (for example, two preprocessing different for the raw images or same feature with different kernels), we selected the variant that achieved the best mean accuracy over the LOO procedure. We also report as validation, the mean accuracy of the learned classifiers on the training dataset.
We sum up the obtained results in Tab 1 and the properties of the proposed approach.
4 Conclusion and perspectives
Based on the validation results, it was difficult to choose one method from the three proposed. Indeed, the big gap of accuracy between the LOO and validation results led us to fear for overfitting. On the other hand, despite worst results, using the MarginalHaar features shows closer validation and LOO accuracy criterion.
After having consulted the competition organizers, we submitted the three methods and obtained surprizing results.
4.1 Competition results
We first report our final results as announced by the organizers ^{7}^{7}7The final results are available at http://www.univorleans.fr/i3mto/results
In this Tab 2, we report the criteria used by the competition organizers :

TP  True Positive : number of subjects with Osteoporosis correctly identified

FP False Positive : number of Control subjects incorrectly identified

TN  True Negative : number of Control subjects correctly identified

FN  False Negative : number of subjects with Osteoporosis incorrectly identified

Sn  Sensitivity : defined as

Sp  Specificity : defined as
First results  Blind results  

Method  TP  FP  TN  FN  Sn  Sp  TP  FP  TN  FN  Sn  Sp  rank 
MarginalHaar  36  20  38  22  0.62  0.66  19  10  19  10  0.66  0.66  1 
CmdMatgrad  54  7  51  4  0.93  0.88  16  15  14  13  0.55  0.48  5 
CovMatgab  46  7  51  12  0.79  0.88  13  14  15  16  0.45  0.52  6 
From those criteria, the challenge organizers derived six other criteria used to rank the submitted methods. In Tab 2, we report the mean rank on those criteria as communicated by the challenge organizers.
From the gap between the first and blind results, it clearly appears that both covariance based methods overfitted and obtained deceiving results.
It should be stated that the Mariginal based method has been ranked first on every of the criteria used by the organizers.
4.2 Perspectives
As stated in the introduction, we have not applied stateofthe art features in texture classification but rather tried to apply previously proposed work. Indeed, it should be noted the recently proposed scattering transform [13, 2] may be a more powerful texture descriptor than what we proposed.
It should also be noted that Wavelet marginals have been used in a rather different setting than their original proposition [19]. Indeeed, we restricted ourself to the use of a single Haar wavelet basis but since there is a strong impact on the choosen wavelet parametrizing a marginal, we should have validated carefully this choice.
In the original method, the wavelet parameter was selected through an MKL based approach that could not be applied for this competition since the rules restricted the use of SVM classifiers only (implicitly forbidding MKL methods). Moreover, as the final results suggest it, the main issue in this competition resides in overfitting, so an MKL approach having more degree of liberty, its should be very carefully tuned.
In the same line of thought, combining different features (through an MKL method) has shown very good practical results in [4]. For a real world application where enough data are available, this would be a very promising future work. Yet, in a context of data competition (with only limited data), such an MKL approach may lead to overfitting.
Acknowledgments
Part of this work has been carried in the laboratory LITIS with the support of a grant from LeMOn ANR11JS0210 and it was finished in Tokyo Institute of Technology with the support of a JSPS fellowship.
References
 [1] Alexandre Barachant, Stéphane Bonnet, Marco Congedo, and Christian Jutten. Classification of covariance matrices using a riemannianbased kernel for bci applications. Neurocomputing, 112:172–178, 2013.
 [2] Joan Bruna and Stéphane Mallat. Invariant scattering convolution networks. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8):1872–1886, 2013.
 [3] Dario Farina, Omar Feix do Nascimento, MarieFrançoise Lucas, and Christian Doncarli. Optimization of wavelets for classification of movementrelated cortical potentials generated by variation of forcerelated parameters. Journal of neuroscience methods, 162(1):357–363, 2007.
 [4] Peter Gehler and Sebastian Nowozin. On feature combination for multiclass object classification. In Computer Vision, 2009 IEEE 12th International Conference on, pages 221–228. IEEE, 2009.
 [5] Bernard Haasdonk and Claus Bahlmann. Learning with distance substitution kernels. Pattern Recognition, pages 220–227, 2004.
 [6] Stephane Mallat. A wavelet tour of signal processing: the sparse way. Academic press, 2008.

[7]
M. Moakher.
A differential geometric approach to the geometric mean of symmetric positivedefinite matrices.
SIAM Journal on Matrix Analysis and Applications, 26(3):735–747, 2005.  [8] Hélio Palaio and Jorge Batista. A kernel particle filter multiobject tracking using gaborbased region covariance matrices. In Proceedings of the IEEE International Conference on Image Processing (ICIP), pages 4085–4088. IEEE, 2009.

[9]
Fatih Porikli and Tekin Kocak.
Robust license plate detection using covariance descriptor in a neural network framework.
In Proceedings of the IEEE International Conference on Video and Signal Based Surveillance (AVSS), page 107, 2006.  [10] Fatih Porikli, Oncel Tuzel, and Peter Meer. Covariance tracking using model update based means on riemannian manifolds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–8. IEEE, 2006.
 [11] Peter J Rousseeuw. Least median of squares regression. Journal of the American statistical association, 79(388):871–880, 1984.
 [12] Peter J. Rousseeuw and Katrien Van Driessen. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3):212–223, 1999.
 [13] Laurent Sifre and Stéphane Mallat. Combined scattering for rotation invariant texture analysis. In European Symposium on Artificial Neural Networks, 2012.
 [14] Jing Yi Tou, Yong Haur Tay, and Phooi Yee Lau. Gabor filters as feature images for covariance matrix on texture classification problem. In Advances in NeuroInformation Processing, pages 745–751. Springer, 2009.
 [15] Oncel Tuzel, Fatih Porikli, and Peter Meer. Region covariance: A fast descriptor for detection and classification. In Proceedings of the European Conference on Computer Vision (ECCV), pages 589–600, 2006.
 [16] Oncel Tuzel, Fatih Porikli, and Peter Meer. Human detection via classification on riemannian manifolds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–8. IEEE, 2007.
 [17] Sabine Verboven and Mia Hubert. Libra: a matlab library for robust analysis. Chemometrics and intelligent laboratory systems, 75(2):127–136, 2005.
 [18] Florian Yger. A review of kernels on covariance matrices for bci applications. In Machine Learning for Signal Processing (MLSP), 2013 IEEE International Workshop on, pages 1–6. IEEE, 2013.
 [19] Florian Yger and Alain Rakotomamonjy. Wavelet kernel learning. Pattern Recognition, 44(10):2614–2629, 2011.