1 Introduction
In recent years wavelet methods have contributed to art history through their application to forgery detection [1], linking of underdrawing and overpainting [2], and uncovering elements of style [3, 4]
. The wavelet transform is a powerful tool which captures local features of an image at different resolutions; the characteristics of wavelet coefficients at different scales can be analyzed via hidden Markov tree models. When applied to brushwork, the combination of these feature extraction methods and machine learning algorithms is able to separate highresolution digital scans of paintings by Vincent van Gogh from paintings by other artists (see Johnson et al.
[5]). Given the features extracted from the wavelet coefficients of the scanned images, several different classifiers can be usefully applied (see
[2, 3] for examples).In this paper we seek to uncover the intrinsic styles in a dataset provided by the North Carolina Museum of Art, consisting of highresolution scans of five panels attributed to Giotto di Bondone. In contrast to prior work [2], we lack side information such as underdrawings of different styles. As in [1, 2], we extract features from the dataset by combining a dualtree complex wavelet transform [6] with hidden Markov trees [7]. In the absence of a training set (such as a set of overpainting patches labeled by underdrawing styles in [2]
), we cannot apply supervised learning algorithms. Our main contribution is to use probabilistic topic models to model different styles present in each painting, where a style corresponds to a
topic in the bag of words model. This produces a meaningful discrimination between the five paintings based on the style distribution in each painting.2 Feature Extraction
2.1 Color Representation
We first introduce the HSL color representation, commonly used in computer graphics, and easily computed from RGB; it gives a perceptually more accurate representation of color relationships, superior to RGB. HSL [8] describes colors by their hue (coded by an angle on the color wheel, from 0 to 360), saturation (ranging from 0 to 1) and lightness (again from 0 to 1); this color space is often represented as a doublecone. This doublecone can be expressed in Cartesian coordinates (XYZ), computed from HSL values by setting
(1)  
2.2 Wavelet Transforms
Wavelet transforms allow us to analyze images at different resolutions. By their finite support, wavelets are able to capture information specific to a particular location. Multiresolution analysis then provides information specific to both location and resolution or scale. The clustering and persistence properties of wavelet transforms provide us with statistical models to characterize the dependencies between wavelet coefficients [9]; see also Section 2.3.
In the dualtree complex wavelet transform (CWT) [6], two separate discrete wavelet transforms (DWT) decompose signals into real and imaginary parts. The magnitude of each complex wavelet coefficient is nearly shift invariant, rendering CWT insensitive to small image shifts. The dualtree CWT has six subbands and provides greater orientation selectivity than DWT. It represents local differences in terms of six basic directions, making it possible to capture brushstroke information specific to location, scale and orientation.
2.3 Hidden Markov Trees
The clustering and persistence properties of wavelet coefficients suggest the use of an HMT model to capture the intrinsic statistical structure of wavelet coefficients of an image; this provides an efficient dimensionality reduction technique that is robust to noise induced by the scanning process [7].
At each level, the wavelet coefficients are modeled as a mixture of two Gaussian distributions: one, with large variance, corresponds to the wavelet component with high energy; the other, with small variance, corresponds to smooth components. Each HMT model thus has three parameters:

, state transition probability between successive scales

, variance of the narrow Gaussian distribution

, variance of the wide Gaussian distribution
We divide paintings into patches of size
; features are extracted independently for each. HMT parameters are estimated by iterative Expectation Maximization (EM) method (see details in
[9]). The feature vector of each patch has entries: 2 variances for each of 6 scales in each of 6 subbands, and 2 (nonredundant) transition probabilities for each of 4 finest scales. (See Algorithm 1.)3 Topic Modeling Based Style Analysis
The “signature features” making up the feature vectors can be viewed as primitive building blocks of the style of an artist; depending on the style of each artist, these signatures appear with different proportions [3, 5]. Usage patterns representing these proportions can be learned from the data. In a departure from previous work, we propose to learn these usage patterns, using Latent Dirichlet Allocation (LDA), a probabilistic model for discrete bag of words data. The choice of stylistic elements in the creation of a painting is here viewed as similar to the choice of words in the process of creating a literary text. Learned topics, or stylistic patterns, are weights over the stylistic words; each painting is represented by weighted combinations of these patterns. This approach is similar to the bag of words model for object recognition in [10] which represents images with different proportions of object recognition feature patterns.
Since there are only five paintings, we divide each panel into subimages with no overlap (these subimages are larger than the small patches we used for feature extraction in Section 2). We consider each subimage as an independent representation of the style of that panel’s artist. We begin with definitions and show the mathematical formulation afterwards.

The basic unit of a subimage is a single patch , characterized by keywords indexed by . (We will explain below how to generate these keywords.) A patch associated to the th keyword () is represented by a dimensional vector with all but the th entry zero, and . Two patches with the same keyword have the same drawing style.

A subimage is composed of a sequence of patches (equivalent to one article in the LDA model in [11]).

An album is a collection of subimages, corresponding to a “corpus” in [11].
To generate the keywords, proceed as follows: cluster all patches with a divisive hierarchical clustering method, constituting a fixed vocabulary of keywords in which patch is assigned a label . This is a form of vector quantization; it represents each subimage as a bag of words, see Algorithm 2. The binary structure of the labeling procedure ensures that labels that share dominant digit in their binary expansions are similar. “Stylistic patterns” correspond to distribution over “keywords”; our approach assumes that two subimages having a similar distribution over keywords are similar in style.
The Dirichlet priors that control each subimage/pattern distribution and pattern/keyword distribution have parameters and . These parameters are sampled once when generating a collection of images. If there is a collection of stylistic patterns, then the
dimensional Dirichlet random variable
, describing the pattern proportion for one subimage, is sampled once per subimage. The probability density function of
is given by(2) 
To generate a patch in a subimage, the “artistic process” first chooses a stylistic pattern based on ; this is a dimensional unit vector with if the th pattern is selected. The patternkeyword distribution is a matrix of size , with column representing the keyword distribution in pattern . Then the process produces a patch ; note that . Given and
, the joint distribution of this generative model can be written as
(3) 
A graphical description and dependencies of the generative model is shown in Figure 1. The only observed variable is the patch, the hidden variables are subimage/pattern distributions, patterns and pattern assignments. One needs to infer the hidden variables based on the observed variables, i.e., our goal is to compute and from . In [11], Blei et al. proposed an efficient variational EM algorithm for inference and parameter estimation; we use the Latent Dirichlet Allocation package [12] for variational Bayesian (VB) implementation of LDA.
4 Results and Discussion
We divide the scans of the 5 panels into 221 nonoverlapping subimages, each of size pixels. In the feature extraction step we divide each subimage into small patches with 32 pixels overlap, so there are patches in one subimage. Each patch is associated with one keyword label; we observe the statistical keyword distribution in each subimage. The number of stylistic patterns, , is set to be 20.
We obtain two main parameters from LDA inference: the distribution function over the keywords for each pattern, and the weights over patterns in each subimage. The keyword distributions for some patterns are in Figure 2; each pattern is a sparse combination of the keywords in the vocabulary. Some patterns are highly weighted over keywords with labels with the same dominant binary digits (for example, patterns 16 and 18 are both concentrated on keywords indexed from 128 to 255); these patterns are thus similar (see Section 3) .
The pattern distribution for each of the five panels is obtained by adding the pattern weights of all its subimages and normalizing the weights. Figure 3 shows, e.g., that patterns 6 and 8 are more heavily represented in panel 5 than in the other four. Figure 4 visualizes this in terms of the original panels: each subimage in each panel is shown with a brightness proportional to the sum of the weights of patterns 6 and 8 in that subimage. Figure 4 does the same for patterns 16, 18, 19 and 20, which are clearly more heavily represented in panel 2.
Figure 4 illustrates the differences between the panels differently. Each subimage is characterized by a 20dimensional pattern distribution vector . We use the tSNE algorithm of [13] to visualize all subimages by projecting the pattern distribution matrix onto an optimal twodimensional plane. Most of the blue dots (subimages of panel 5) are located in one ellipse and a large percentage of the pink dots (panel 2) in another, disjoint ellipse; the subimages in other three paintings are widely distributed in the plane. Once again panels 2 and 5 (the Virgin Mary and St. Francis) stand out.
5 Conclusions
In this paper, we motivated and introduced a successful application of probabilistic topic modeling in painting analysis.
The results are preliminary but promising. This project started when two of us (D. S. and W. B.) asked the other authors whether the techniques of [2] could be used to quantify stylistic differences art historians perceive among the five panels of the Peruzzi Altarpiece. It is known that the panels were painted on wood from one single plank, that they all were prepared in the same way, that the underdrawings are similar; a recent study by the Getty Institute also showed similarities between pigments in the red, pink, blue, brown hair, and flesh paints. Yet the same study also found differences making especially the panel depicting St. Francis (panel 5 in our numbering) stand out. It is intriguing that this painting likewise stood out most in our analysis, as being “stylistically” more different. On the other hand, the Getty study found more reason to single out the panel of John the Baptist (panel 4) than the Virgin Mary (panel 2).
Future work includes increasing the adaptivity of our model in the area of painting analysis, restricting it to subsets of the panels (only faces, or hands, or draped fabric, or hair). We intend to check, in particular, to what extent specific elements can be separated by our methods; concentrating on hair, for instance, the thin parallel brush work in the mustache/beard in Christ (panel 3) and John the Baptist (panel 4) seems, to the art historian’s eye, possibly from a different hand than other parts of the altarpiece, and it would be interesting to see whether this is mirrored by a more adapted image analysis of the type described in this paper. Detailed images of the same five panels with other imaging modalities, such as Xray radiography and infrared reflectography, are also available; we would like to fold the results of all these techniques together in a more ambitious, multispectral analysis. Finally, there have been several very interesting articles in the art history literature which attempt to establish similarities and differences in techniques between those paintings thought to be by Giotto and others in the workshop or followers; in this framework, it would be useful to extend our work to this larger body of paintings.
References
 [1] G. Polatkan, S. Jafarpour, A. Brasoveanu, S. Hughes, and I. Daubechies, “Detection of forgery in paintings using supervised learning,” in IEEE international conference on Image processing (ICIP), 2009, pp. 2885–2888.
 [2] J. Wolff, M. Martens, S. Jafarpour, I. Daubechies, and R. Calderbank, “Uncovering elements of style,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2011, pp. 1017–1020.
 [3] S. Jafarpour, G. Polatkan, E. Brevdo, S. Hughes, A. Brasoveanu, and I. Daubechies, “Stylistic Analysis of Paintings using Wavelets and Machine Learning,” in European Signal Processing Conference (EUSIPCO), 2009.
 [4] J. Li, L. Yao, E. Hendriks, and J. Z. Wang, “Rhythmic Brushstrokes Distinguish van Gogh from His Contemporaries: Findings via Automated Brushstroke Extraction,” IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), vol. 34, no. 6, pp. 1159–1176, June 2012.
 [5] C. R. Johnson, E. Hendriks, I. Berezhnoy, E. Brevdo, S. Hughes, I. Daubechies, J. Li, E. Postma, and J. Z. Wang, “Image Processing for Artist Identification: Computerized Analysis of Vincent van Gogh’s Painting Brushstrokes,” IEEE Signal Processing Magazine, July 2008.
 [6] I. W. Selesnick, R. G. Baraniuk, and N. G. Kingsbury, “The dualtree complex wavelet transform,” IEEE Signal Processing Magazine, vol. 22, no. 6, pp. 123–151, Nov. 2005.
 [7] H. Choi, J. K. Romberg, R. G. Baraniuk, and N. G. Kingsbury, “Hidden Markov Tree Modeling of Complex Wavelet Transforms,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2000, pp. 133–136.
 [8] M. K. Agoston, Computer Graphics and Geometric Modeling: Implementation & Algorithms, Springer, 2005.

[9]
M. S. Crouse, R. D. Nowak, and R. G. Baraniuk,
“Waveletbased statistical signal processing using hidden Markov models,”
IEEE Trans. Signal Processing, vol. 46, no. 4, pp. 886–902, Apr. 1998. 
[10]
F. Li and P. Perona,
“A bayesian hierarchical model for learning natural scene
categories,”
in
IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)
, 2005, pp. 524–531.  [11] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003.
 [12] D. Mochihashi, “lda, a Latent Dirichlet Allocation package,” 2012, http://chasen.org/~daitim/dist/lda/.
 [13] L. van der Maaten and G. E. Hinton, “Visualizing Data using tSNE,” Journal of Machine Learning Research, vol. 9, pp. 2579–2605, Nov. 2008.
Comments
There are no comments yet.