Texture Retrieval via the Scattering Transform

by   Alexander Sagel, et al.

This work studies the problem of content-based image retrieval, specifically, texture retrieval. It focuses on feature extraction and similarity measure for texture images. Our approach employs a recently developed method, the so-called Scattering transform, for the process of feature extraction in texture retrieval. It shares a distinctive property of providing a robust representation, which is stable with respect to spatial deformations. Recent work has demonstrated its capability for texture classification, and hence as a promising candidate for the problem of texture retrieval. Moreover, we adopt a common approach of measuring the similarity of textures by comparing the subband histograms of a filterbank transform. To this end we derive a similarity measure based on the popular Bhattacharyya Kernel. Despite the popularity of describing histograms using parametrized probability density functions, such as the Generalized Gaussian Distribution, it is unfortunately not applicable for describing most of the Scattering transform subbands, due to the complex modulus performed on each one of them. In this work, we propose to use the Weibull distribution to model the Scattering subbands of descendant layers. Our numerical experiments demonstrated the effectiveness of the proposed approach, in comparison with several state of the arts.


page 4

page 7

page 8

page 9


Texture feature extraction in the spatial-frequency domain for content-based image retrieval

The advent of large scale multimedia databases has led to great challeng...

Texture retrieval using periodically extended and adaptive curvelets

Image retrieval is an important problem in the area of multimedia proces...

Color Texture Image Retrieval Based on Copula Multivariate Modeling in the Shearlet Domain

In this paper, a color texture image retrieval framework is proposed bas...

A Novel Approach to Texture classification using statistical feature

Texture is an important spatial feature which plays a vital role in cont...

Dynamic Texture Recognition via Nuclear Distances on Kernelized Scattering Histogram Spaces

Distance-based dynamic texture recognition is an important research fiel...

Binary Distance Transform to Improve Feature Extraction

To recognize textures many methods have been developed along the years. ...

Building Function Approximators on top of Haar Scattering Networks

In this article we propose building general-purpose function approximato...

I Introduction

I-a Overview

Content-based image retrieval (CBIR) is a special case of image classification. It can be viewed as the process of assigning a query image to a set of image classes, where each class represents a database image. Content-based hereby refers to the mode of formulating the search query. As opposed to metadata-based image search, which relies on pre-labeling the images beforehand, a CBIR system retrieves the best matches within the database with respect to the visual similarity to the query image.

In CBIR, the concepts of feature extraction (FE) and similarity measure (SM) play crucial roles. FE converts an image into a feature vector

of numerical values with the aim to produce a low-dimensional, yet sensible representation of the input image in the context of some particular application. An SM assigns a numerical value to a pair of two feature vectors. In this work, we assume that the SM is nonnegative and that a smaller SM value indicates a higher similarity and vice versa. A typical CBIR system is depicted in Fig.

1. The images to be extracted are processed via an FE algorithm and the extracted feature vectors (signatures) are stored in a database. The query image is fed to the same FE algorithm and the output is compared with all the feature vectors in the database by means of the defined SM. As a result, the images with the lowest SM values are returned. Consequently, designing a CBIR system boils down to the construction of an FE/SM framework.












Fig. 1: A typical architecture of content-based image retrival.

Nowadays, CBIR is employed to deal with big databases and vast amounts of data. Thus, in additional to high retrieval efficiency, execution speed is of great concern. This requires sufficient dimensionality reduction within the FE and a computationally efficient SM.

I-B Related Work and Our Contribution

Extracting sensible information from images has been a field of research in digital signal processing for roughly half a century now. Naturally, the choice of the FE method depends on the specific application. For instance, when comparing images of landscapes, pictures taken in a forest will have different color distributions from those taken in a desert, so an FE based on the color properties of the image is a reasonable approach. On the other hand, images of objects are strongly characterized by shapes, so in this case the FE could rely on extracted contours. This work focuses on CBIR for textures. The earliest approaches for texture FE, such as the gray-level co-occurrence matrices (GLCM) [1] and the biologically motivated proposed Tamura features [2] have been presented in the 1970s and are often based on numerical measures for how nearby pixels relate to each other.

More recent approaches include incorporating spatial-frequency representations of the images [3], such as the Fast Wavelet Transform (FWT) [4], Gabor frames [5] or Curvelet frames [6]. A conventional approach is designing the features based on the energy of the respective transform coefficients, but it can be beneficial to look at their statistical properties [7, 8]. Thus, FE approaches based on the histograms of filterbank transforms have become more prominent, cf. [9, 10, 11, 12, 13, 14, 15]. Techniques of this kind are referred to as Subband Histogram methods in the work.

Linear filterbank transforms feed the input image to a bank of frequency-selective filters, yielding a set of band-pass signals as a representation. One problem of constructing FE based on such a decomposition is that the higher frequency subbands are prone to deformations in the spatial domain. Thereby, even slight deformations of an image will yield significantly different transform coefficients. However, simply neglecting the higher-frequency subbands is not desirable since they carry important information about such distinctive features as edges and corners. As a remedy, Mallat introduced the Scattering transform [16], a non-linear signal representation involving filterbanks and the modulus operation which transforms high-frequency components into low-pass representations. The Scattering transform appears to perform well in texture classification tasks [17, 18, 19]. The key contribution of this work is to define a statistical model in order to design subband histogram FE for texture images based on their Scattering transform coefficients. Furthermore, we introduce a similarity measure based on the extracted features, with the property of being a Kernel and thus can be interpreted as an inner product of Textures transformed to some Hilbert space.

The paper is organized as follows. Section II provides necessary mathematical preliminaries of the problem of CBIR, subband histogram methods, and the basics of the Scattering transform. In Section III we develop a statistical framework of CBIR based on the Scattering transform. In Section IV several numerical experiments are presented to investigate the performance of the proposed approach in comparison with the state of the art methods.

Ii Mathematical Preliminaries

Ii-a Notations

Unless stated otherwise, a signal denotes an element of the Lebesgue space . The terms almost everywhere and almost all refer to conditions which hold for all , where can be any null set. For the sake of simplicity, we refer to as a Hilbert space of functions, even though it would be more precise to call them equivalence classes of functions that are equal almost everywhere. Bold-faced lowercase letters or describe vectors, while regular lowercase letters like or describe scalar values. Depending on the context, uppercase letters stand either for scalar values or for matrices. For a complex value , denotes its conjugate, and the real and imaginary part, respectively. Angle brackets denote the scalar product of and the canonical norm induced by it. An asterisk denotes the convolution of two signals .

Ii-B Subband Histogram Methods

The coefficients produced by a transform of the query image can be viewed as a set of realizations of a random experiment. Let describe the probability density function (PDF) of the transform coefficients of some database image.

The likelihood of a set of random realizations for a PDF is defined as


and is a measure for the probability that is subjected to . That is to say, can be viewed as a measure of similarity between the query image and the database images. Since the natural logarithm is a monotone function, this is also true for the log-likelihood .

Let denote the set of the PDFs of a number of database images. The best match for a query can thus be determined via


Expression (2) is called a Maximum Likelihood (ML) solution.

Assume the query is subjected to a PDF

. Then, by the law of large numbers, the log-Likelihood

can be approximated by the negative cross-entropy


Assuming a generative model, the cross-entropy defines a similarity measure between the respective query and database image.

One of the most well-studied transforms in image processing is the multiresolution decomposition produced by the FWT. In particular, it is known that the histograms of its band-pass subbands can be modeled by the Generalized Gaussian Distribution (GGD), c.f.[20], defined as


where denotes the Gamma-Function,


and the parameter determines the scale of the distribution while describes the shape. The cross-entropy between two GGDs ist given by


Equation (6

) provides a parametrized SM for texture images. The GGD parameters can be estimated from each image using FE based on ML which together with the SM in (

6) defines a complete framework for CBIR. This approach was explored and discussed in [9]

, though it was equivalently formulated in terms of the Kullback-Leibler divergence (KLD) rather than the cross-entropy. It can be viewed as a blueprint for other subband histogram methods for which the motivation is twofold. First, the FWT is not the only filterbank transform for images and is not necessarily the best basis for texture FE. Furthermore, even though the cross-entropy can be rigorously motivated by ML optimization, it lacks any geometrical interpretation, which poses the question, whether there are more suitable SMs for parametrized probability models.

Ii-C The Scattering transform

Ii-C1 Definition

Let be a rotationally symmetric window function with low-pass characteristics. Let and be fixed and a finite group of rotation matrices. With , we define the wavelet as


Further, let us assume a low-pass and rotationally symmetric scaling function [21] and define


such that for the respective Fourier transforms



holds, for almost all . Then the set


constitutes a Parseval-tight frame [21]. It will be assumed to span in the following. Note that for two signals , the equation



At the core of the Windowed Scattering transform (WST) [16] is a dyadic wavelet decomposition of the input signal with the complex modulus performed on the band-pass components, defined as


The modulus operation traverses some of the energy of the band-pass signals towards lower frequencies. Therefore, can be applied to the output signals again. This yields a tree structure like the one depicted in Fig. 2. Note that (12) yields an infinite (but countable) number of output signals. However, in practice we deal with bandlimited input signals . Hence, there exists an integer with , such that Eq. (12) needs to be evaluated for only, which corresponds to a tree with finite breadth.

Basically, the idea of the WST is to apply successively to the input signal and only keep the low-pass signals, i.e. to neglect signals represented by the black nodes in Fig. 2.

Fig. 2: WST tree produced by successive application of on the input signal . Lowpass signals (The WST subbands) are depicted as white nodes. The complex moduli of band-pass signals are depicted as black nodes.

The WST along the path of scaling factors and rotations is defined as


Let us denote by the (ordered) set of all possible paths of size to . We can then define the Finite-path WST as


Ii-C2 Properties

The output signals of the Finite-path WST are the elements of (14), also referred to as WST subbands in the following, which are all low-pass signals due to the filtering with . With increasing , captures more and more energy of the input signal and the WST representation becomes more expressive as the maximum path length grows. Let the Scattering norm be defined as


As a consequence of the tight frame property (9) of , the infinite-path WST is unitary, i.e.


In practice, maximum path depths of are expected to capture all essential information [22, 23]. Actually reconstructing from its WST involves a phase recovery problem. However, it is known [24] that digital realizations of wavelet representations similar to those employed in the WST are uniquely determined by their complex modulus.

The WST is non-expansive [16], i.e. for two signals , it can be shown [16] that for any positive integer ,


holds, implying that the WST is robust with respect to additive noise.

Since WST is a representation consisting solely of low-pass signals, it is stable with respect to small spatial translations and deformations. Specifically, let be a deformed or translated version of . Under certain mild assumptions about the underlying wavelet frame, it has been proven that the error is bounded asymptotically, cf. [16]. Note, that we are given some freedom of choice in the wavelet frame which allows for considerable flexibility in terms of parameters such as frequency selectivity or directionality.

Due to its stability and flexibility, the WST is a convenient signal representation and can be used as the foundation for the FE from images.

Ii-C3 Normalized WST

In order to reduce redundancy and to increase invariance to distortions, the Normalized WST (NWST) [23] was introduced. Let be the predecessor of , i.e. implies .

Let denote a very narrow-band lowpass blurring filter. For the layers , the NWST is defined as


In words, each subband of the WST is normalized by the respective parent subband, except for the subbands in the first layer which are normalized by the mean of the modulus of the input signal. In practice, a small constant is added to the denominator in order to avoid division by zero.

Iii Statistical Scattering CBIR

Iii-a Subband Modeling

We propose to model the gray-value distributions of the different WST subbands with parametrized PDFs and describe the images in terms of their respective parameters to obtain a complete FE mechanism on top of the WST.

The most distinctive features in textures are those of higher frequencies and are thus carried by the layers . These layers contain signals of the form


Clearly, the modulus eliminates all negative values which is why a symmetric model such as the GGD is not appropriate anymore. The histograms of signals from descendant layers suggest a PDF that occupies only positive values and can describe a skew to the left, which makes the

Weibull Distribution (WD) a nearby choice. In fact, the WD was already successfully employed in the modeling of complex wavelet coefficients [15]. Fig. 3 exemplarily shows histograms of WST subbands of different textures with a fitted WD curve. Despite variations in skewness and spread, the WD fittings describe the actual histograms fairly well. The PDF of the WD is defined for as


Analogous to the GGD, is the scale parameter, whereas determines the shape.

Fig. 3: Histograms (blue) and respective ML Weibull fittings (red) of WST subbands from ayers and for different texture patches from the UIUC texture database

The above argument is purely empirical. The following discussion aims to justify further the choice of the WD as a model for WST subbands (19) in the layers : Like for the layer , we neglect the impact of the low-pass filter . The approximately analytical band-pass filters produce signals with similar zero-mean distributions in their real and imaginary parts. Moreover, it is known that band-pass components of a natural image can be well modeled by a GGD. This is also a reasonable assumption for band-pass components of Scattering subbands since they resemble natural images under dim lif´ghting conditions. It is therefore natural to assume the GGD with the same parameters as a model for the distribution of the real and imaginary parts of the band-pass signals involved in the WST. By neglecting statistical dependence between real and imaginary components, let us define a “complex GGD” with statistically independent real and imaginary parts as


In order to determine the PDF for the modulus , we need to fix and integrate over the circles centered at 0 in the complex plane:


Evaluating the integral analytically is demanding, if not impossible. However, it is known that the modulus of a complex variable with statistically independent and Gaussian distributed real and imaginary parts abides the Rayleigh distribution, which can be easily verified by substituting and into Eq. (22), i.e.


The usage of the GGD for modeling FWT subbands was motivated by the need to model histograms which are more or less peaked in shape than it is the case for the Gaussian distribution. For this purpose, the additional shape parameter was introduced. This corresponds to introducing an additional shape parameter to the Rayleigh distribution. Altering the peakedness of the real and imaginary part corresponds roughly to altering the skewness of the respective modulus distribution. From Eq. (22), we can observe that , so a good alternative to for modeling the WST subbands would be a two-parameter generalization of the Rayleigh distribution which is controllable in its skewness without changing its value at 0. For , these requirements are fulfilled by the WD.

Even though it is sensible to model the WST with its Weibull coefficients, it is still questionable if this decision is justifiable for the NWST. It is certainly true for the first layer since it only involves an overall scaling. Unfortunately, this can not be assumed for the other descendant layers. Nevertheless, experiments show, that in practice this assumption still holds. However, for the most important ranges of , the multiplicative inverses of WD distributed samples exhibit histograms which again can be well modeled by the WD as can be seen in Figure 4. Thus, the normalized coefficients in (18) for are products of values whicht are close to be Weibull distributed and statistically independent. Thus the WD will again dominate the subband histograms.

Fig. 4: Histogram of Weibull distributed samples and their inverses

We employ a numerical approach to estimate the Weibull parameters via minimizing the corresponding ML function. The derivation of the algorithm can be found in Appendix -A.

Iii-B Scattering Similarity

Iii-B1 The Bhattacharyya Kernel

Both the cross-entropy and the KLD can be derived for the WD in order to define an SM similar to the on discussed in Subsection II-B, cf. [25]. However, the derivation leads to very complex expressions.

As an alternative, we propose an SM based on the Bhattacharyya coefficient. For a pair of PDFs , it is defined as


Expression (24

) is a popular choice for Kernels in the machine learning community

[26] and as such imposes a Hilbert space structure on probability based features.

Iii-B2 Weibull Similarity

Our aim is to derive a simple approximation of (24) for a pair of Weibull distributions . To this end, let us assume that . This is a justifiable assumption since the distributions are always compared for each subband individually. Let us further define


This ultimately leads to


For the sake of convenience, let us write the arithmetical and geometrical mean of two values



respectively. From the last equation of (26) we define our similarity measure for pairs of Weibull PDFs as


In the derivation of , we replace and by each time it appears as the exponent of and , respectively. Even though this step is not necessary to eveluate the integral in (26), it ensures a straight-forward understanding of (28), with the first factor measuring the similarity of the scale and the second factor measuring the similarity of the shape. Moreover, just like (24), the expression (28) defines a Kernel for Weibull PDFs, as it is shown in the following proposition.

Proposition 1.

Expression (28) is a Kernel for Weibull parameters.


Note that for the approximation in (26) becomes an equality. I.e.,


is a Kernel, since it is a Bhattacharyya coefficient of two PDFs. So is


because both the shape and the scale parameter of the WD are constrained to be from the same domain . Therefore, Expression (28) is the product of two Kernels and as such a Kernel itself. ∎

The Kernel property of (28) provides us with some geometrical intuition, hence makes it a good choice as the fundamental building block of an SM. This includes the fact that it naturally induces a metric, which incorporates the notion of similarity as a geometric distance as is for instance the motivation behind multidimensional scaling [27].

It also enables the (N)WST to be conveniently subjected to many machine learning techniques such as Support Vector Machines.

Iii-B3 Similarity for Multiple Subbands

Consider two PDFs

of two statistically independent random variables



It can be easily seen that


Carrying over this insight to the derivation (26) yields, for a pair of sets of independent WDs with the parameter vectors ,


For a pair of WST transforms with subbands, it is therefore straightforward to derive an SM. Since by our definition a low SM indicates a high similarity we apply the logarithm and reverse the sign which finally leads us to


Iii-C Implementing the Framework

The two previous sections provide the theoretical basis for building a complete CBIR based on WST Subband histograms. In order to extract the features, the WST (Section II-C) is performed on each image. Each subband of the layers is then subjected to ML in order to extract the WD parameters. If the number of subbands is , this amounts to feature vectors of size . In order to compare the query feature vector to the feature vectors in the database, the SM (34) us employed. As most of these steps can be performed independently of each other, the whole system is well suited to be implemented in a parallel manner, for example in applications for Big Data.

Iv Numerical Experiments

Iv-a Experimental Settings

Iv-A1 Implementation

All experiments were performed in a Matlab 2014a environment. The code reproducing the key results is available online111https://www.ldv.ei.tum.de/uploads/media/texture_retrieval_scattering_14.zip. The WST was performed by means of the ScatNet222http://www.di.ens.fr/data/software/scatnet/ toolbox, version 0.2. The Weibull parameters were extracted via wblfit from the Statistical Toolbox. Since it is the most standard subband histogram technique, we also implemented the FWT+GGD+KLD method according to [9] or comparison. Additionally, we implemented a method based the Dual-Tree Complex Wavelet Transform (DT-CWT), inspired by [15]. The DT-CWT was performed by the DT-CWT Transform Pack333http://www-sigproc.eng.cam.ac.uk/Main/NGK, version 4.3. Similarly the subband histograms were modeled by the WD, but the KLD was replaced by the proposed SM in (34) for the sake of comparability.

Iv-A2 Choice of parameters

Along with the regular WST, all experiments were also conducted with its normalized version as defined in (18). The directionality parameter, corresponding to the cardinality of the rotation group , is fixed to be . All computations were performed with double precision and the subbands of the WST are critically sampled according to the Shannon-Nyquist sampling theorem. The bandwidth of the low-pass filter is chosen in a way such that it produces signals of size , in accordance with [9]. The WST is evaluated for the maximum path lengths and . In theory, a linear increase in leads to an exponential increase in the number of subbands. However, since the modulus operation hardly traverses any energy toward higher frequencies, they become negligible with increasing path length. Apart from the mentioned, default settings of ScatNet were used. In particular, the Morlet Wavelet was used as the band-pass filter on each layer of the WST.

The FWT relies on the D2 4-tap Daubechies wavelet [21] as the underlying multiresolution representation. The DT-CWT was run with the options antonini and qshift_a. In both cases, the size of the smallest subband was , which corresponds to three levels of decomposition.

Iv-A3 Data

Fig. 5: Database D1 from left to right and top to bottom, with row and column in brackets: Bark0 (1,1), Bark6 (1,2), Bark8 (1,3), Bark9 (1,4), Brick1 (1,5), Brick4 (1,6), Brick5 (1,7), Buildings9 (1,8), Fabric0 (2,1), Fabric4 (2,2), Fabric7 (2,3), Fabric9 (2,4), Fabric11 (2,5), Fabric14 (2,6), Fabric15 (2,7), Fabric17 (2,8), Fabric18 (3,1), Flowers5 (3,2), Food0 (3,3), Food5 (3,4), Food 8 (3,5), Grass1 (3,6), Leaves8 (3,7), Leaves10 (3,8), Leaves11 (4,1), Leaves12 (4,2), Leaves16 (4,3), Metal0 (4,4), Metal2 (4,5), Misc2 (4,6), Sand0 (4,7), Stone1 (4,8), Stone4 (5,1), Terrain10 (5,2), Tile1 (5,3), Tile4 (5,4), Tile7 (5,5), Water5 (5,6), Wood1 (5,7), and Wood2 (5,8).
Fig. 6: Database D2 (patches) from left to right and top to bottom: Bark1, Bark2, Wood2, Wood3, Water, Marble, Floor1, Floor2, Pebbles, Wall, Brick1, Glass1, Glass2, Carpet1, Carpet2, Wallpaper, Fur, Knit, Curdoroy, Plaid.

The database D1 is the same as used in [9] and is based on the VisTex database444http://vismod.media.mit.edu/vismod/imagery/VisionTexture/distribution.html. It consists of 40 texture images of the size , each divided into 16 non-overlapping patches. This amounts to 640 image samples of 40 different textures depicted in Fig. 5. The database D2 was generated from images of the following subset of the UIUC texture database555http://www-cvr.ai.uiuc.edu/ponce_grp/data/: Bark1, Bark2, Wood2, Wood3, Water, Marble, Floor1, Floor2, Pebbles, Wall, Brick1, Glass1, Glass2, Carpet1, Carpet2, Wallpaper, Fur, Knit, Curdoroy, Plaid. Two images from each class are used to create five overlapping patches which are then scaled down to half the edge size. Hence, we get a database of 20 different texture classes each containing ten patches. Fig. 6 depicts one patch from each utilized image. All image patches are normalized to zero mean and unit energy, in order to avoid any bias caused by the overall lighting condition of each original texture image. The set of all patches generated from the same texture is considered a class. Its cardinality will be denoted by in the following. Consequently, for D1 and for D2. For each image patch, the most similar patches were retrieved. The retrieval rate for each patch is defined as the ratio of the number of retrieved patches from the same class to . The overall retrieval rate is the average of the retrieval rates for all the images in the database.

Iv-B Results

Iv-B1 Overview

The retrieval rates are summarized in Table I.

Database D1 75.50% 78.18% 78.93% 78.14% 84.90% 85.30%
Database D2 52.39% 59.61% 66.50% 63.78% 66.94% 65.17%
TABLE I: Performance of WST + WD and NWST + WD in comparison with FWT + GGD and DT-CWT + GGD on databases 1 and 2
WD-HMM Rotated Wavelets
76.93% 78.40% 80.05% 82.81%
TABLE II: Performance of state of the art methods on Database D1

While WST+WD+ produces a similar result as DT-CWT+WD+ on Database D1, NWST+WD+ is able to outperform all of the competing frameworks by for and for . Database D1 is widely used as a benchmark for CBIR retrieval. In order to provide a sense for the state of the art, Table II

summarizes the results from recent publications on comparable approaches: DWT + Generalized Gamma Distribution (G

D) [13]

, Wavelet Domain Hidden Markov Models (WD-HMM)

[10] and Rotated Complex Wavelets [28]. Also, the original result for FWT+GGD+KLD from [9] was included, since we were not able to reproduce it. To the authors’ best knowledge, the best published result for this experiment so far was produced by Rotated Complex Wavelets with a retrieval rate of which is still outperformed by 2.09% by NWST+WD+ with and 2.49% with .

Database D2 is considerably smaller than D1, but involves more variation within the classes, for instance, in terms of camera angle and deformation. Again, the WST greatly improves the retrieval performance. However, this time the regular WST does not fall behind the NWST. Also, increasing the maximum path length up to harms the performance. However, we can conclude that NWST+WD+ performs comparably well and produces the best results in both our test settings.

Iv-B2 Impact of Directionality

In general, natural textures contain strongly directional features. Hence, it is often beneficial to incorporate a frame of higher directionality as it is done for the Rotated Wavelets approach, for example. The DT-CWT+WD+ method which mostly draws its benefits from increased directional selectivity in comparison with standard FWT based techniques is consistently and significantly outperformed by NWST+WD+, despite the same directionality properties. That is to say, there are reasons to believe that the success of the WST based methods is not exclusively due to increased directionality.

Iv-B3 Impact of Textural Structure

Fig. 7: Retrieval rates for each texture class of D1 individually. From left to right, with index in brackets: Bark0 (1), Bark6 (2), Bark8 (3), Bark9 (4), Brick1 (5), Brick4 (6), Brick5 (7), Buildings9 (8), Fabric0 (9), Fabric4 (10), Fabric7 (11), Fabric9 (12), Fabric11 (13), Fabric14 (14), Fabric15 (15), Fabric17 (16), Fabric18 (17), Flowers5 (18), Food0 (19), Food5 (20), Food8 (21), Grass1 (22), Leaves8 (23), Leaves10 (24), Leaves11 (25), Leaves12 (26), Leaves16 (27), Metal0 (28), Metal2 (29), Misc2 (30), Sand0 (31), Stone1 (32), Stone4 (33), Terrain10 (34), Tile1 (35), Tile4 (36), Tile7 (37), Water5 (38), Wood1 (39), and Wood2 (40).
Fig. 8: Retrieval Rate

To further analyze the capabilities of the involved methods, consider Fig. 7. It shows the retrieval rates for each class in Database D1 produced by DT-CWT as well as by WST and NWST with . DT-CWT + GGD performed well in comparison to the WST based techniques for the classes Brick5, Buildings9, Metal0 and Wood2 and badly for Bark9, Fabric4, Leaves10 and Stone1. In general, it appears that DT-CWT is most suitable for flat and (piecewise) smooth surfaces, while the two methods involving Scattering seem to be preferable for surfaces that are rather uneven, be it because they are rough or because they are bent or dented. This correlates with the premise that Scattering coefficients are less sensitive to spatial deformations, since on the one hand, images of rough surfaces carry significant fractions of high-frequency components and on the other hand spatial deformation in the image is often caused by locally bending or denting the depicted material. More specifically, NWST performed comparably well for the classes Bark0, Bark6, Flowers5, Terrain10 and regular WST for the classes Bark8, Fabric11, Grass1, Stone4. This suggests that WST works better on fine structures while NWST yields better results for coarse structures.

This assumption is corroborated by the following experiment. Fig. 8 depicts the retrieval rates for the Database D2 after blurring it with a Gaussian filter of different bandwidths. Unsurprisingly, the retrieval performance declines with stronger blurring for all methods. However, it is evident that the retrieval rate for WST declines significantly faster than the other two. This can be interpreted as a consequence of the distortion invariance of normalized Scattering coefficients. More importantly, this confirms that the regular WST method relies on fine features since they are most heavily affected by blurring. On the other hand, images of coarse patterns and flat and piecewise smooth surfaces are less affected by blurring such that the other two methods appear to be relatively robust.

V Conclusion and Outlook

In this work, we propose a subband histogram FE method based on the statistics of the WST of a texture image. Our derivation and analysis demonstrate how to extract statistical features from a WST representation of a texture image by means of matching a Weibull model via ML and how to measure the similarity of the respective feature vectors via the KLD. The proposed techniques come in handy when it comes to reducing the enormous degree of redundancy introduced by the WST since they represent each subband by as much as two real numbered parameters. The experiments show that the proposed method can outperform comparable algorithms based on filter bank transforms in terms of retrieval accuracy when designed as a CBIR system. Regular WST coefficients seem to work better on fine structures and Normalized WST coefficients on coarse structures.

Since approaches for rotation invariant Scattering representations are already available [17, 18], it appears feasible to extend the presented ideas towards a rotation invariant CBIR framework which would allow for more general problem settings.

The summation in (34) was partly motivated by assuming statistical independence of the subbands. However, neither does statistical independence hold for the WST subbands, nor for the subbands of other filterbank transforms, in practice. Accounting for statistical dependence in the model could lead to significant improvements in efficiency [12, 10].

-a Estimation of Weibull parameters

In this section, we employ a numerical optimization approach to estimate the Weibull parameters. A more detailed derivation can be found in [29]. The log-likelihood amounts to


The optimal parameters are identified by solving


The first derivatives of with respect to and are computed as




Setting Eq. (38) to be equal to uniquely determines for . Solving for yields


Substituting (39) in Eq. (37) and equating it with characterize the critical points of in terms of for , i.e.


Since the function as defined in Eq. (35) is concave with respect to , a numerical algorithm such as the Newton-Raphson method can be used to obtain . Substituting this solution into Eq. (39) yields .


  • [1] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural features for image classification,” Systems, Man and Cybernetics, IEEE Transactions on, no. 6, pp. 610–621, 1973.
  • [2] H. Tamura, S. Mori, and T. Yamawaki, “Textural features corresponding to visual perception,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 8, no. 6, pp. 460–473, 1978.
  • [3] T. Randen and J. Husoy, “Filtering for texture classification: a comparative study,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 21, no. 4, pp. 291–310, Apr 1999.
  • [4] J. Smith and S.-F. Chang, “Transform features for texture classification and discrimination in large image databases,” in Image Processing, 1994. Proceedings. ICIP-94., IEEE International Conference, vol. 3, Nov 1994, pp. 407–411 vol.3.
  • [5] B. S. Manjunath and W.-Y. Ma, “Texture features for browsing and retrieval of image data,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 18, no. 8, pp. 837–842, 1996.
  • [6] I. Sumana, M. Islam, D. Zhang, and G. Lu, “Content based image retrieval using curvelet transform,” in Multimedia Signal Processing, 2008 IEEE 10th Workshop on, Oct 2008, pp. 11–16.
  • [7] G. Van de Wouwer, P. Scheunders, and D. Van Dyck, “Statistical texture characterization from discrete wavelet representations,” Image Processing, IEEE Transactions on, vol. 8, no. 4, pp. 592–598, 1999.
  • [8] N. Vasconcelos, “On the efficient evaluation of probabilistic similarity functions for image retrieval,” Information Theory, IEEE Transactions on, vol. 50, no. 7, pp. 1482–1496, July 2004.
  • [9] M. N. Do and M. Vetterli, “Wavelet-based texture retrieval using generalized gaussian density and kullback-leibler distance,” IEEE Trans. Image Processing, vol. 11, no. 2, pp. 146–158, 2002.
  • [10] ——, “Rotation invariant texture characterization and retrieval using steerable wavelet-domain hidden markov models,” Multimedia, IEEE Transactions on, vol. 4, no. 4, pp. 517–527, 2002.
  • [11] D.-Y. Po and M. Do, “Directional multiscale modeling of images using the contourlet transform,” in Statistical Signal Processing, 2003 IEEE Workshop on, Sept 2003, pp. 262–265.
  • [12] G. Tzagkarakis, B. Beferull-Lozano, and P. Tsakalides, “Rotation-invariant texture retrieval with gaussianized steerable pyramids,” Image Processing, IEEE Transactions on, vol. 15, no. 9, pp. 2702–2718, Sept 2006.
  • [13] S. Choy and C. Tong, “Statistical wavelet subband characterization based on generalized gamma density and its application in texture retrieval,” Image Processing, IEEE Transactions on, vol. 19, no. 2, pp. 281–289, Feb 2010.
  • [14] M. Allili, “Wavelet modeling using finite mixtures of generalized gaussian distributions: Application to texture discrimination and retrieval,” Image Processing, IEEE Transactions on, vol. 21, no. 4, pp. 1452–1464, April 2012.
  • [15] R. Kwitt and A. Uhl, “Image similarity measurement by kullback-leibler divergences between complex wavelet subband statistics for texture retrieval,” in Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on.   IEEE, 2008, pp. 933–936.
  • [16] S. Mallat, “Group invariant scattering,” Communications on Pure and Applied Mathematics, vol. 65, no. 10, pp. 1331–1398, 2012.
  • [17] L. Sifre and S. Mallat, “Combined scattering for rotation invariant texture analysis,” in

    European Symposium on Artificial Neural Networks

    , 2012.
  • [18] ——, “Rotation, scaling and deformation invariant scattering for texture discrimination,” in Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on.   IEEE, 2013, pp. 1233–1240.
  • [19] J. Bruna, “Scattering representations for recognition,” Ph.D. dissertation, CMAP, Ecole Polytechnique, 2012.
  • [20] S. Mallat, “A theory for multiresolution signal decomposition: the wavelet representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 11, no. 7, pp. 674–693, Jul 1989.
  • [21] ——, A Wavelet Tour of Signal Processing - The Sparse Way.   Elsevier Science, 2008, pp. 89–376.
  • [22] J. Bruna and S. Mallat, “Invariant scattering convolution networks,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35, no. 8, pp. 1872–1886, 2013.
  • [23] J. Andèn and S. Mallat, “Deep scattering spectrum,” IEEE Transactions on Signal Processing, 2013.
  • [24] I. Waldspurger, A. d’Aspremont, and S. Mallat, “Phase recovery, maxcut and complex semidefinite programming,” Mathematical Programming, pp. 1–35, 2012.
  • [25] C. Bauckhage, “Computing the Kullback-Leibler Divergence between two Weibull Distributions,” ArXiv e-prints, 2013.
  • [26] T. Jebara, R. Kondor, and A. Howard, “Probability product kernels,” The Journal of Machine Learning Research, vol. 5, pp. 819–844, 2004.
  • [27] W. Torgerson, Theory and methods of scaling.   Wiley, 1958.
  • [28] M. Kokare, P. K. Biswas, and B. N. Chatterji, “Texture image retrieval using new rotated complex wavelet filters,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 35, no. 6, pp. 1168–1178, 2005.
  • [29] D. Sornette, Critical Phenomena in Natural Sciences: Chaos, Fractals, Selforganization and Disorder: Concepts and Tools, ser. Springer Series in Synergetics.   Springer, 2006, pp. 185–194.