Neural Style Representations and the Large-Scale Classification of Artistic Style

11/16/2016 ∙ by Jeremiah Johnson, et al. ∙ University of New Hampshire 0

The artistic style of a painting is a subtle aesthetic judgment used by art historians for grouping and classifying artwork. The recently introduced `neural-style' algorithm substantially succeeds in merging the perceived artistic style of one image or set of images with the perceived content of another. In light of this and other recent developments in image analysis via convolutional neural networks, we investigate the effectiveness of a `neural-style' representation for classifying the artistic style of paintings.



There are no comments yet.


page 2

page 8

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Any observer can sense the artistic style of painting, even if it takes training to articulate it. To an art historian, the artistic style is the primary means of classifying the painting [10]. However, artistic style is not well defined, and may be loosely described as “.. a distinctive manner which permits the grouping of works into related categories” [1]. Algorithmically determining the artistic style of an artwork is a challenging problem which may include analysis of features such as the painting’s color, its texture, and its subject matter, or none of those at all. Detecting the style of a digitized image of a painting poses additional challenges raised by the digitization process, which itself has consequences that may affect the ability of a machine to correctly detect artistic style; for instance, textures may be affected by the resolution of the digitization. Despite these challenges, intelligent systems for detecting artistic style would be useful for identification and retrieval of images of a similar style.

In this paper we investigate several methods based on recent advances in convolutional neural networks for large-scale determination of artistic style. In particular, we adapt the neural-style algorithm introduced in [2] for large-scale style classification, showing performance that is competitive with other deep convolutional neural network based approaches.

Figure 1: Original image on the left, after application of the ‘neural-style’ algorithm (style image ’Starry Night’, by Van Gogh) on the right.

2 Related Work

Algorithmic determination of artistic style in paintings has only been considered sporadically in the past. Examples of early efforts at style classification are [8] and [15], where the datasets used are quite small, and only a handful of very distinct artistic style categories considered. Several complex models are constructed in [14] by hand-engineering features on a large dataset similar to the one used for this work. And in [7], it is demonstrated that convolutional neural networks may be effective for understanding image style in general, including artistic style in paintings. In the papers just mentioned the number of artistic style categories is held to a relatively small 25 and 27 broadly defined style categories arespectively.

In the paper “A Neural Algorithm of Artistic Style”, it is demonstrated that the correlations between the low-level feature activations in a deep convolutional neural network encode sufficient information about the style of the input image to permit a tranfer of the visual style of the input image onto a new image via an algorithm informally referred to as the “neural-style” algorithm [2]. An example of the output of this algorithm is presented in Figure 1. Several authors have built upon the work of Gatys et. al. in the past year [13], [12], [12], [6]. These investigations have primarily focused on ways to improve either the quality of the style transfer or the efficiency of the algorithm. To the best of our knowledge the only other look at the use of the style representation of an image as a classifier is in [11].

3 Data and Methods

3.1 Data

The data used for this investigation consists of 76449 digitized images of fine art paintings. The vast majority of the images were originally obtained from

, the largest online repository of fine-art paintings. For convenience, we utilize a prepackaged set of images sourced and prepared by Kiri Nichols and hosted by the data-science competition website A stratified 10% of the dataset was held out for validation purposes. We chose to use a finer set of style categories for classification than has been used in previous work on image style, as we believe that finer classification is likely necessary for practical application. We utilize 70 distinct style categories, the maximum amount possible while maintaining at least 100 observations of each style category. This noticably increases the complexity of the classification task as many of the class boundaries are not well-defined, the classes are unbalanced, and there are not nearly as many examples of each of the artistic styles as in previous attempts at large-scale artistic style classification.

3.2 The Neural Style Algorithm

The primary insight in the neural-style algorithm outlined by Gatys et. al. is that the correlations between low-level feature activations in a convolutional neural network capture information about the style of the image, while higher-level feature activations capture information about the content of the image. Thus, to construct an image that merges both the style of an image and the content of an image

, an image is initialized as white noise and the following two loss functions are simultaneously minimized:




where is the number of filters in the layer, is the spatial dimensionality of the feature map, and represent the feature maps extracted by the network at layer from the images and respectively, and letting represent the feature maps extracted by the network at layer from the image , and . That is, the style loss, which encodes the images style, is a loss taken over Gram matrices for filter activations.

3.3 Style Classification

Accuracy (top 1%)
Convolutional Neural Network 27.47
Pretrained Residual Neural Network 36.99
Table 1: Baseline Results

To establish a baseline for style classification, we first trained a single convolutional neural network from scratch. The network has a uniform structure consisting of convolutional layers with 3x3 kernels and leaky ReLUs activations (

). Between every pair of convolutional layers is a fractional max pooling layer with a 3x3 kernel. Fractional max-pooling is used as given the relatively small size of the dataset, the more commonly used average or max-pooling operations would lead to rapid data loss and a relatively shallow network

[3]. The convolutional layer sizes are

followed by a fully-connected layer and 70-way softmax. 10% dropout is applied to the fully connected layer. Aside from mean normalization and horizontal flips, the data were not augmented in any way. The model was trained over 55 epochs using stochastic gradient descent and achieved a top 1% accuracy of 27.468%.

We then finetuned a pretrained object classification model for style classification. The pretrained model used was a residual neural network with 50 layers pretrained on the ImageNet 2015 dataset. There are two motivating factors for choosing to finetune this network. The first is that residual networks currently exhibit the best on object recognition tasks, and previous work on style classification suggests that a network trained for the task of object recognition and then finetuned for image style detection will perform the task well

[4], [7]

. The second and more interesting reason from the standpoint of artistic style classification is that the architecture of a residual neural network makes the outputs of lower levels of the network available to higher levels in the network. In this way, the net functions similar to a Long Short-Term Memory network without gates

[17]. For style classification, this is particularly appealing as a means of allowing the higher levels in the net to consider both lower-level features and higher-level features when forming an artistic style classification, where the style may very much be determined by the lower-level features. The residual neural network model obtained top-1% accuracy of 36.985%.

To determine whether or not the style representation encoded in the Gram matrices for a given image has any power as a classifier, we extracted the Gram matrices of feature activations at layers ReLU1_1, ReLU2_1, ReLu3_1, ReLu4_1, and ReLU5_1 from a VGG-19 network for the paintings described above [16]. The choice of network and layers was based on the quality of the style transfers obtained with these choices in [2]

. The pretrained VGG-19 model was obtained from the Caffe Model Zoo

[5]. The Gram matrices were then reshaped to account for symmetry, producing a total of 304,416 distinct features per image, nearly a factor of four greater that the total number of observations in the dataset.

Analyzing the style representation was approached in two ways. First, the full feature vector was normalized and then passed to a single-layer linear classifier which was trained using Adam over 55 epochs, producing a top 1% accuracy of 13.23%


We then built random forest classifiers on the individual Gram matrices extracted from the activations of the network. The dimensionality of the Gram matrices post-reshaping is 2016, 8128, 32640, 130816, and 130816 respectively. Considered separately, the random forest classifiers built on the first three of these style representations performed better than the linear classifier based on the full style representation and better than the baseline convolutional neural network, with top-1% accuracies of 27.84%, 28.97%, and 33.46%. The random forests built on the latter two layers performed considerably worse. The results are presented in table


In contrast to results reported in [11]

, we observed a significant loss in accuracy when dimensionality reduction was even lightly utilized on these smaller layers. For instance, performing PCA while preserving 90% of the variance in the data from the layer ReLU1_1 style representation reduced the accuracy of the random forest model on that layer from 27.84% to 17%, perhaps due to our use of a larger, less homogeneous dataset. We also saw no significant gains when the data were normalized.

Accuracy (top 1%)
Full Style Representation - Linear Classifier 13.21
ReLU1_1 Random Forest 27.84
ReLU2_1 Random Forest 28.97
ReLU3_1 Random Forest 33.46
ReLU4_1 Random Forest 9.79
ReLU5_1 Random forest 10.18
Table 2: Style Representation Results

4 Conclusion & Future Work

The ‘neural-style’ representation of an artwork offers competitive performance as an artistic style classifier; nevertheless, in our experiments a finetuned deep neural network still obtains superior results. Our best results using the ‘neural-style’ representation of artistic style were obtained when models suitable for high-dimensional nonlinear data were constructed individually on the first three Gram matrices that form the building blocks of the style representation.

It appears that the art-historical definition of artistic style is not quite what is captured by the neural style algorithm using this network and these layers. Nevertheless it is clear that this information is relevant and has some predictive ability, and understanding and improving on these results is a target for future work.


The author would like to thank NVIDIA for GPU donation to support this research, for providing many of the images, the website for hosting the data, and Kiri Nichols for sourcing the data.