Painting Analysis Using Wavelets and Probabilistic Topic Models

In this paper, computer-based techniques for stylistic analysis of paintings are applied to the five panels of the 14th century Peruzzi Altarpiece by Giotto di Bondone. Features are extracted by combining a dual-tree complex wavelet transform with a hidden Markov tree (HMT) model. Hierarchical clustering is used to identify stylistic keywords in image patches, and keyword frequencies are calculated for sub-images that each contains many patches. A generative hierarchical Bayesian model learns stylistic patterns of keywords; these patterns are then used to characterize the styles of the sub-images; this in turn, permits to discriminate between paintings. Results suggest that such unsupervised probabilistic topic models can be useful to distill characteristic elements of style.



There are no comments yet.


page 4


Combining Thesaurus Knowledge and Probabilistic Topic Models

In this paper we present the approach of introducing thesaurus knowledge...

Keyword-based Topic Modeling and Keyword Selection

Certain type of documents such as tweets are collected by specifying a s...

Keyword Assisted Topic Models

For a long time, many social scientists have conducted content analysis ...

Query Generation for Patent Retrieval with Keyword Extraction based on Syntactic Features

This paper describes a new method to extract relevant keywords from pate...

QDR-Tree: An Efcient Index Scheme for Complex Spatial Keyword Query

With the popularity of mobile devices and the development of geo-positio...

Document Generation with Hierarchical Latent Tree Models

In most probabilistic topic models, a document is viewed as a collection...

4D Dual-Tree Complex Wavelets for Time-Dependent Data

The dual-tree complex wavelet transform (DT-ℂWT) is extended to the 4D s...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years wavelet methods have contributed to art history through their application to forgery detection [1], linking of underdrawing and overpainting [2], and uncovering elements of style [3, 4]

. The wavelet transform is a powerful tool which captures local features of an image at different resolutions; the characteristics of wavelet coefficients at different scales can be analyzed via hidden Markov tree models. When applied to brushwork, the combination of these feature extraction methods and machine learning algorithms is able to separate high-resolution digital scans of paintings by Vincent van Gogh from paintings by other artists (see Johnson et al.


). Given the features extracted from the wavelet coefficients of the scanned images, several different classifiers can be usefully applied (see

[2, 3] for examples).

In this paper we seek to uncover the intrinsic styles in a dataset provided by the North Carolina Museum of Art, consisting of high-resolution scans of five panels attributed to Giotto di Bondone. In contrast to prior work [2], we lack side information such as underdrawings of different styles. As in [1, 2], we extract features from the dataset by combining a dual-tree complex wavelet transform [6] with hidden Markov trees [7]. In the absence of a training set (such as a set of overpainting patches labeled by underdrawing styles in [2]

), we cannot apply supervised learning algorithms. Our main contribution is to use probabilistic topic models to model different styles present in each painting, where a style corresponds to a

topic in the bag of words model. This produces a meaningful discrimination between the five paintings based on the style distribution in each painting.

This paper is organized as follows. Section 2 reviews some background of wavelet-based HMT models applied to feature representation. Section 3 introduces our post analysis strategy via topic models. Results and discussion are provided in Section 4; conclusions and future work are in Section 5.

2 Feature Extraction

2.1 Color Representation

We first introduce the HSL color representation, commonly used in computer graphics, and easily computed from RGB; it gives a perceptually more accurate representation of color relationships, superior to RGB. HSL [8] describes colors by their hue (coded by an angle on the color wheel, from 0 to 360), saturation (ranging from 0 to 1) and lightness (again from 0 to 1); this color space is often represented as a double-cone. This double-cone can be expressed in Cartesian coordinates (XYZ), computed from HSL values by setting


2.2 Wavelet Transforms

Wavelet transforms allow us to analyze images at different resolutions. By their finite support, wavelets are able to capture information specific to a particular location. Multiresolution analysis then provides information specific to both location and resolution or scale. The clustering and persistence properties of wavelet transforms provide us with statistical models to characterize the dependencies between wavelet coefficients [9]; see also Section 2.3.

In the dual-tree complex wavelet transform (CWT) [6], two separate discrete wavelet transforms (DWT) decompose signals into real and imaginary parts. The magnitude of each complex wavelet coefficient is nearly shift invariant, rendering CWT insensitive to small image shifts. The dual-tree CWT has six subbands and provides greater orientation selectivity than DWT. It represents local differences in terms of six basic directions, making it possible to capture brushstroke information specific to location, scale and orientation.

2.3 Hidden Markov Trees

The clustering and persistence properties of wavelet coefficients suggest the use of an HMT model to capture the intrinsic statistical structure of wavelet coefficients of an image; this provides an efficient dimensionality reduction technique that is robust to noise induced by the scanning process [7].

At each level, the wavelet coefficients are modeled as a mixture of two Gaussian distributions: one, with large variance, corresponds to the wavelet component with high energy; the other, with small variance, corresponds to smooth components. Each HMT model thus has three parameters:

  • , state transition probability between successive scales

  • , variance of the narrow Gaussian distribution

  • , variance of the wide Gaussian distribution

Input: A set of painting images , .
Initialize: Divide images into patches with pixels overlap, both vertically and horizontally. We term each patch , where is the number of patches in image .

1:  for each patch  do
2:  Convert into XYZ domain; normalize each domain.
3:  Calculate complex wavelet coefficients and norms .
4:  Compute HMT parameters .
5:  end for


: A set of feature vectors


Algorithm 1 Feature Extraction

We divide paintings into patches of size

; features are extracted independently for each. HMT parameters are estimated by iterative Expectation Maximization (EM) method (see details in

[9]). The feature vector of each patch has entries: 2 variances for each of 6 scales in each of 6 subbands, and 2 (non-redundant) transition probabilities for each of 4 finest scales. (See Algorithm 1.)

3 Topic Modeling Based Style Analysis

The “signature features” making up the feature vectors can be viewed as primitive building blocks of the style of an artist; depending on the style of each artist, these signatures appear with different proportions [3, 5]. Usage patterns representing these proportions can be learned from the data. In a departure from previous work, we propose to learn these usage patterns, using Latent Dirichlet Allocation (LDA), a probabilistic model for discrete bag of words data. The choice of stylistic elements in the creation of a painting is here viewed as similar to the choice of words in the process of creating a literary text. Learned topics, or stylistic patterns, are weights over the stylistic words; each painting is represented by weighted combinations of these patterns. This approach is similar to the bag of words model for object recognition in [10] which represents images with different proportions of object recognition feature patterns.

Since there are only five paintings, we divide each panel into sub-images with no overlap (these sub-images are larger than the small patches we used for feature extraction in Section 2). We consider each sub-image as an independent representation of the style of that panel’s artist. We begin with definitions and show the mathematical formulation afterwards.

  • The basic unit of a sub-image is a single patch , characterized by keywords indexed by . (We will explain below how to generate these keywords.) A patch associated to the th keyword () is represented by a dimensional vector with all but the th entry zero, and . Two patches with the same keyword have the same drawing style.

  • A sub-image is composed of a sequence of patches (equivalent to one article in the LDA model in [11]).

  • An album is a collection of sub-images, corresponding to a “corpus” in [11].

Input: All feature vectors .
Initialize: Set , .

Repeat until :

1:  for  do
2:  .
3:  Divide into 2 subsets and using -means algorithm, i.e., and .
4:  Label Update: If , . If , then .
5:  end for

Set .
Output: A set of labels .

Algorithm 2 Keyword Labeling

To generate the keywords, proceed as follows: cluster all patches with a divisive hierarchical clustering method, constituting a fixed vocabulary of keywords in which patch is assigned a label . This is a form of vector quantization; it represents each sub-image as a bag of words, see Algorithm 2. The binary structure of the labeling procedure ensures that labels that share dominant digit in their binary expansions are similar. “Stylistic patterns” correspond to distribution over “keywords”; our approach assumes that two sub-images having a similar distribution over keywords are similar in style.

The Dirichlet priors that control each sub-image/pattern distribution and pattern/keyword distribution have parameters and . These parameters are sampled once when generating a collection of images. If there is a collection of stylistic patterns, then the

-dimensional Dirichlet random variable

, describing the pattern proportion for one sub-image, is sampled once per sub-image. The probability density function of

is given by


To generate a patch in a sub-image, the “artistic process” first chooses a stylistic pattern based on ; this is a -dimensional unit vector with if the th pattern is selected. The pattern-keyword distribution is a matrix of size , with column representing the keyword distribution in pattern . Then the process produces a patch ; note that . Given and

, the joint distribution of this generative model can be written as

Figure 1: Graphical model for the generative “painting process”. Only one node is observed (shaded): the patches in sub-images. Boxes denote repeated processes. The different plates: is the collection of patches, is the stylistic patterns, is the collection of sub-images.

A graphical description and dependencies of the generative model is shown in Figure 1. The only observed variable is the patch, the hidden variables are sub-image/pattern distributions, patterns and pattern assignments. One needs to infer the hidden variables based on the observed variables, i.e., our goal is to compute and from . In [11], Blei et al. proposed an efficient variational EM algorithm for inference and parameter estimation; we use the Latent Dirichlet Allocation package [12] for variational Bayesian (VB) implementation of LDA.

4 Results and Discussion

We divide the scans of the 5 panels into 221 non-overlapping sub-images, each of size pixels. In the feature extraction step we divide each sub-image into small patches with 32 pixels overlap, so there are patches in one sub-image. Each patch is associated with one keyword label; we observe the statistical keyword distribution in each sub-image. The number of stylistic patterns, , is set to be 20.

Figure 2: Keyword distribution of some stylistic patterns. Left: patterns 6, 8, 9. Middle: patterns 10, 12, 15. Right: patterns 16, 18, 20.
Figure 3: Stylistic pattern distributions in the five panels. For each painting, the x-axis denotes the pattern label; the y-axis corresponds to the total weight of each pattern in the painting.
Figure 4: (a) Illustration of the relative importance of patterns 6 and 8 in sub-images in the 5 panels. (b) Same for patterns 16, 18, 19 and 20. (c) Similarity of sub-images visualized via t-SNE; sub-images from the same painting are dots of the same color.

We obtain two main parameters from LDA inference: the distribution function over the keywords for each pattern, and the weights over patterns in each sub-image. The keyword distributions for some patterns are in Figure 2; each pattern is a sparse combination of the keywords in the vocabulary. Some patterns are highly weighted over keywords with labels with the same dominant binary digits (for example, patterns 16 and 18 are both concentrated on keywords indexed from 128 to 255); these patterns are thus similar (see Section 3) .

The pattern distribution for each of the five panels is obtained by adding the pattern weights of all its sub-images and normalizing the weights. Figure 3 shows, e.g., that patterns 6 and 8 are more heavily represented in panel 5 than in the other four. Figure 4 visualizes this in terms of the original panels: each sub-image in each panel is shown with a brightness proportional to the sum of the weights of patterns 6 and 8 in that sub-image. Figure 4 does the same for patterns 16, 18, 19 and 20, which are clearly more heavily represented in panel 2.

Figure 4 illustrates the differences between the panels differently. Each sub-image is characterized by a 20-dimensional pattern distribution vector . We use the t-SNE algorithm of [13] to visualize all sub-images by projecting the pattern distribution matrix onto an optimal two-dimensional plane. Most of the blue dots (sub-images of panel 5) are located in one ellipse and a large percentage of the pink dots (panel 2) in another, disjoint ellipse; the sub-images in other three paintings are widely distributed in the plane. Once again panels 2 and 5 (the Virgin Mary and St. Francis) stand out.

5 Conclusions

In this paper, we motivated and introduced a successful application of probabilistic topic modeling in painting analysis.

The results are preliminary but promising. This project started when two of us (D. S. and W. B.) asked the other authors whether the techniques of [2] could be used to quantify stylistic differences art historians perceive among the five panels of the Peruzzi Altarpiece. It is known that the panels were painted on wood from one single plank, that they all were prepared in the same way, that the underdrawings are similar; a recent study by the Getty Institute also showed similarities between pigments in the red, pink, blue, brown hair, and flesh paints. Yet the same study also found differences making especially the panel depicting St. Francis (panel 5 in our numbering) stand out. It is intriguing that this painting likewise stood out most in our analysis, as being “stylistically” more different. On the other hand, the Getty study found more reason to single out the panel of John the Baptist (panel 4) than the Virgin Mary (panel 2).

Future work includes increasing the adaptivity of our model in the area of painting analysis, restricting it to subsets of the panels (only faces, or hands, or draped fabric, or hair). We intend to check, in particular, to what extent specific elements can be separated by our methods; concentrating on hair, for instance, the thin parallel brush work in the mustache/beard in Christ (panel 3) and John the Baptist (panel 4) seems, to the art historian’s eye, possibly from a different hand than other parts of the altarpiece, and it would be interesting to see whether this is mirrored by a more adapted image analysis of the type described in this paper. Detailed images of the same five panels with other imaging modalities, such as X-ray radiography and infrared reflectography, are also available; we would like to fold the results of all these techniques together in a more ambitious, multispectral analysis. Finally, there have been several very interesting articles in the art history literature which attempt to establish similarities and differences in techniques between those paintings thought to be by Giotto and others in the workshop or followers; in this framework, it would be useful to extend our work to this larger body of paintings.


  • [1] G. Polatkan, S. Jafarpour, A. Brasoveanu, S. Hughes, and I. Daubechies, “Detection of forgery in paintings using supervised learning,” in IEEE international conference on Image processing (ICIP), 2009, pp. 2885–2888.
  • [2] J. Wolff, M. Martens, S. Jafarpour, I. Daubechies, and R. Calderbank, “Uncovering elements of style,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2011, pp. 1017–1020.
  • [3] S. Jafarpour, G. Polatkan, E. Brevdo, S. Hughes, A. Brasoveanu, and I. Daubechies, “Stylistic Analysis of Paintings using Wavelets and Machine Learning,” in European Signal Processing Conference (EUSIPCO), 2009.
  • [4] J. Li, L. Yao, E. Hendriks, and J. Z. Wang, “Rhythmic Brushstrokes Distinguish van Gogh from His Contemporaries: Findings via Automated Brushstroke Extraction,” IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), vol. 34, no. 6, pp. 1159–1176, June 2012.
  • [5] C. R. Johnson, E. Hendriks, I. Berezhnoy, E. Brevdo, S. Hughes, I. Daubechies, J. Li, E. Postma, and J. Z. Wang, “Image Processing for Artist Identification: Computerized Analysis of Vincent van Gogh’s Painting Brushstrokes,” IEEE Signal Processing Magazine, July 2008.
  • [6] I. W. Selesnick, R. G. Baraniuk, and N. G. Kingsbury, “The dual-tree complex wavelet transform,” IEEE Signal Processing Magazine, vol. 22, no. 6, pp. 123–151, Nov. 2005.
  • [7] H. Choi, J. K. Romberg, R. G. Baraniuk, and N. G. Kingsbury, “Hidden Markov Tree Modeling of Complex Wavelet Transforms,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2000, pp. 133–136.
  • [8] M. K. Agoston, Computer Graphics and Geometric Modeling: Implementation & Algorithms, Springer, 2005.
  • [9] M. S. Crouse, R. D. Nowak, and R. G. Baraniuk,

    “Wavelet-based statistical signal processing using hidden Markov models,”

    IEEE Trans. Signal Processing, vol. 46, no. 4, pp. 886–902, Apr. 1998.
  • [10] F. Li and P. Perona, “A bayesian hierarchical model for learning natural scene categories,” in

    IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)

    , 2005, pp. 524–531.
  • [11] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003.
  • [12] D. Mochihashi, “lda, a Latent Dirichlet Allocation package,” 2012,
  • [13] L. van der Maaten and G. E. Hinton, “Visualizing Data using t-SNE,” Journal of Machine Learning Research, vol. 9, pp. 2579–2605, Nov. 2008.