ArtGraph: Towards an Artistic Knowledge Graph

05/31/2021 ∙ by Giovanna Castellano, et al. ∙ University of Bari Aldo Moro 0

This paper presents our ongoing work towards ArtGraph: an artistic knowledge graph based on WikiArt and DBpedia. Automatic art analysis has seen an ever-increasing interest from the pattern recognition and computer vision community. However, most of the current work is mainly based solely on digitized artwork images, sometimes supplemented with some metadata and textual comments. A knowledge graph that integrates a rich body of information about artworks, artists, painting schools, etc., in a unified structured framework can provide a valuable resource for more powerful information retrieval and knowledge discovery tools in the artistic domain.



There are no comments yet.


page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The last few years have seen a growing interest from the pattern recognition and computer vision community towards the development of automatic tools to support the analysis of visual arts. Some successful methods have already been proposed to tackle tasks related to art analysis, from time period estimation, e.g. 

[2, 27], to style classification, e.g. [6, 28]. This interest has been mainly increased by the availability of large digitized artwork collections, such as WikiArt,111 which have provided training sets for many automatic art analysis systems. A deeper understanding of visual arts has the potential to make them accessible to a wider population, both in terms of fruition and creation, thus supporting the spread of culture [5].

Most of the work in the literature relies solely on pixel information inherent in digitized paintings and drawings that is suitable for being fed into Convolutional Neural Network (CNN) models for solving classification and retrieval tasks, e.g. 

[11, 19, 25]

. Some other works integrate this information with textual metadata or comments, so that computer vision systems are integrated with natural language processing techniques to address multi-modal learning problems, e.g. 

[10, 14, 15]

. This means that the information exploited is often just the visual features extracted from the digitized artworks. Alternatively, these features are used in conjunction with textual features to create a shared embedding where the two representations are projected and compared. These approaches lead to ignoring a large amount of domain knowledge as well as already known relationships and connections that could increase the quality of existing solutions. Instead, having a knowledge base in which not only artworks but also a rich plethora of metadata, contextual information, textual descriptions, etc., are unified in a structured framework can provide a valuable resource for more powerful information retrieval and knowledge discovery tools in the artistic domain. Such a framework would be beneficial not only for enthusiastic users, who can take advantage of the encoded information to navigate the knowledge base, but also especially for art experts, interested in finding new relationships between artists and/or artworks for a better understanding of the past and modern art.

Figure 1: Overview of rtraph. For better visualization, only a small fraction of 2K nodes is shown here.

To fill this gap, in this paper we present our ongoing work on the development of rtraph: an artistic knowledge graph (KG). The proposed KG integrates information collected from WikiArt and DBpedia, and exploits the potential of the Neo4j database management system, which provides an expressive modeling and graph query language. An overview is provided in Fig. 1.

The rest of the paper is structured as follows. Section 2 deals with related work. Section 3 presents the proposed graph. Section 4 shows implementation details and some applications of the current version of rtraph. Section 5 concludes the work and outlines our future work.

2 Related Work

Traditionally, automatic art analysis has been performed using hand-crafted features fed into traditional machine learning algorithms, e.g. 

[1, 3, 20]. Unfortunately, despite the encouraging results of feature engineering techniques, early attempts soon stalled due to the difficulty of gaining explicit knowledge about the attributes to be associated with a particular artist or artwork. This difficulty arises because this knowledge typically depends on an implicit and subjective experience that a human expert might find difficult to verbalize [7, 23].

In contrast, several successful applications in a range of computer vision tasks have demonstrated the effectiveness of representation learning versus feature engineering techniques in extracting meaningful patterns from complex raw data. One of the first successful attempts to apply deep neural networks in this context was the research presented by Karayev et al. 


, which shows how a pre-trained CNN can be quite effective in attributing the correct school of painting to an artwork. Since then, many works have focused on the use of deep learning techniques based on single-input

[9, 28] or multi-input models [27] to solve artwork attribute prediction tasks based on visual features. Other directions that have attracted the interest of the community working on this domain are visual link retrieval [4, 24], object detection [12, 16, 18], and near-duplicate detection [25].

In recent times, a research direction that has sparked increasing interest is the one which combines computer vision with natural language processing techniques to provide a unified framework for solving multi-modal retrieval tasks. In this view, the system is asked to find an artwork based on textual comments describing it, and vice versa. The first corpus that provides not only artwork images and their attributes, but also artistic comments intended to achieve semantic art understanding is the SemArt dataset [15]. Garcia and Vogiatzis have proposed several models that basically share the same scheme: first, images, descriptions and metadata attributes are encoded in visual and textual embeddings, respectively; then, a multi-modal model is applied to map these embeddings into a common space where a similarity function is used. In [26], Stefanini et al. promoted research in this domain by extending the task of visual-semantic retrieval to a setting where the textual domain does not exclusively contain visual sentences, i.e. those describing the visual content of the work, but also contextual sentences, which describe the historical context of the work, its author, the place where the artwork is located, and so on. To address this two-challenge task, the authors proposed the Artpedia dataset, on which they experimented with a multi-modal retrieval model that jointly associates visual and textual elements, and discriminates between visual and contextual sentences. Although fascinating, unfortunately Artpedia has a relatively small number of artworks, which are around .

Our work is inspired by research conducted by Garcia et al. [14]. They combined a multi-output model trained to solve attribute prediction tasks based on visual features with a second model based on non-visual information extracted from artistic metadata encoded using a KG. This model was intended to inject “context” information to improve the performance of the first model. The general framework was called ContextNet

. To encode the KG information into a vector representation, the popular node2vec model

[17] was adopted. The KG was built using only the information provided by SemArt. To do this, the authors defined a node for each artwork and connected each artwork to its attributes. They used some metadata, including author, title, technique, etc. Also, by applying an -gram model to the title, its keywords were extracted and added to the graph. Metadata are only available for artworks in the dataset, so adding a new artwork would not result in any domain information about it. In addition, the proposed graph have the author node, which allows one to connect artworks with the same author, but without considering the relationships between authors, such as artistic influence. These two limitations can be overcome by relying on a source of knowledge external to the dataset, such as Wikipedia, which provides an enormous amount of information, even in a structured form. Our work is framed in this direction. Furthermore, we do not treat the KG just as an adjacency matrix from which embeddings can be extracted as auxiliary information to be provided to learning models. Instead, we encode the KG into a NoSQL database, namely Neo4j, which already helps provide a powerful knowledge discovery framework without explicitly training a learning system.

3 rtraph

Knowledge graphs have emerged as a fascinating abstraction for organizing structured knowledge and as a way to integrate information extracted from multiple data sources. KGs have also begun to gain increasing popularity in machine (deep) learning as a method of incorporating world knowledge, as a representation of extracted knowledge, and for explaining what is being learned. There is no commonly accepted definition of KG [13]. Any representation of knowledge of real world entities and relationships, but structured like a graph, can be understood as a KG [22]. Formally, a KG can be expressed as , where is the set of entities, is the set of relationships, and is the set of facts. Each fact is a triple , with and .

The labeled graph representation of a KG can be used in various ways depending on the specific application. For example, if the nodes represent people, the edges can capture family relationships between them. As mentioned above, most of the art analysis methods proposed so far have focused only on visual features. Artworks, however, cannot be studied based only on their visual appearance, but also considering various other historical, social and contextual factors that allow them to be framed within a more complex framework. A comprehensive KG would provide a more expressive and flexible representation to incorporate relationships of arbitrary complexity between entities related to art, which cannot be obtained by considering only the visual content.

In this view, we developed rtraph as a KG in the art domain capable of representing and describing concepts related to artworks. Our KG can represent a wide range of relationships, including those between artists and their works. A comparison between our proposed KG and the one presented by Garcia et al. is provided in Table 1. It is worth noting that, at the current stage of our research, we are focusing only on (the most popular) artists, as we are interested in a richer representation of the relationships between them and other entities.

The core nodes of rtraph are authors and artworks. Metadata extracted from WikiArt have been transformed into relationships and nodes mainly related to the artworks, their genre, style, location, etc. Furthermore, since WikiArt does not provide rich information about authors, each author of our KG is connected not only to the artworks produced but also to other nodes built using RDF triples extracted from DBpedia. Extracting and integrating data from these two sources required a laborious process of data cleaning and normalization, as well as some manual intervention to resolve several inconsistencies between the data.

Overall, the conceptual scheme of rtraph (represented in Fig. 2) includes artwork nodes and artist nodes:

  • Each artwork node is connected to the following nodes: tags (e.g., woman, sea, birds), genre (e.g., self-portrait), style, period, series (e.g., “The Seasons” by Giuseppe Arcimboldo), auction, media (e.g., paper, watercolor), the gallery in which the artwork is located, and the city (or country) in which the artwork has been completed.

  • Each artist node is connected to the following nodes: field (e.g., drawing, sculpture), movement (e.g., Surrealism, Renaissance, Pop Art), training (e.g., Accademia di Belle Arti di Firenze), Wikipedia categories (e.g., living people, people from Florence), other artists (influences or teaching, and patrons).

This structure allows the creation of a network between artists, which is useful for further analysis. In total, the resulting KG contains nodes and edges, with artists, artworks, and a huge plethora of metadata and textual comments describing them (Table 1).

KG # nodes # edges # authors # artworks
# relations
btw artworks
# relations
btw authors
Table 1: Comparison between our KG and the one proposed by Garcia et al. [14]. It is worth noting that, although SemArt has more than unique authors, most of them are associated with fewer than ten artworks.
Figure 2: Scheme of rtraph. The nodes correspond to relevant entities in the artistic domain, while the edges represent existing relationships between them.

This is a work in progress and we are aware of some limitations. Not all artworks have a textual description and we are finding additional sources of knowledge to overcome this limitation. Furthermore, there is no direct relationship between an artist and a city/country, so there is no structured geographic information about the artists.

4 Implementation and Some Applications

rtraph has been implemented in Neo4j222 on an i5-10400 system, with a 2.90 GHz CPU and 16GB of RAM. We preferred Neo4j to other existing solutions as it is a native graph database that provides a powerful and flexible framework for storing and querying graph-like structures. Using Neo4j, connections between data are stored and not calculated at query time. Cypher, which is the declarative query language adopted by Neo4j, takes advantage of these stored connections to provide an expressive and optimized language for graphs to execute even complex queries extremely quickly.



Figure 3: The home page of the developed web interface (a) and an example of artist page (b).

To allow for a visual exploration of the graph, we have created a web interface (Fig. 3) that uses JavaScript to connect to Neo4j. The goal is to provide the end user—as mentioned above, not only a generic user but especially any art historian—directly with an easy-to-use exploration tool to view the properties of an artwork or an artist. An art historian, in fact, rarely analyzes artworks as isolated creations, but typically studies how different paintings, even from different periods, relate to each other, how artists from different countries and/or periods have exercised a influence on their works, how artworks completed in one place migrated to other places, and so on. The home page randomly loads artists and artworks. Each artist is associated with a page that reports information such as the biography, the works produced, etc. We leveraged the information provided by DBpedia to show also the fields, movements, other authors who have been influenced by the current artist, and many other tags. By clicking on the buttons, the user can browse the graph interactively. The page layout of an artwork is very similar to that of an author and reports size, period, material, etc. It is also possible to browse the artworks according to the city/country in which they were completed or are currently located. When provided by DBpedia, a textual description of the artwork is also shown.

The developed web interface can also show the results of some queries which can be particularly useful for art analysis, such as: retrieving the direct and indirect influencing connection between artists with different degrees of separation; identifying artworks that are stored in a country other than those in which they were completed; retrieving all the works that are are kept in a particular place (Fig. 4). On the tested platform, each query takes about a few tens of milliseconds. The ability to query the graph database already provides information retrieval and knowledge discovery capabilities in the art domain without having to train a learning system.

Figure 4: From top to bottom, left to right, examples of query results: retrieving the direct and indirect influence between artists; identifying artworks that are stored in a country other than those in which they were completed; retrieving all the works that are are kept in a particular place (e.g., Italy). The colors are automatically set by the Neovis.js visualization tool to reflect some properties of the sub-graph.

5 Conclusions and Future Work

In this paper, we have presented rtraph, an artistic knowledge graph primarily intended to provide art historians with a rich and easy-to-use tool to perform art analysis. This effort can foster the dialogue between computer scientists and humanists that is currently sometimes lacking [21]. Indeed, contrary to other works, we are not only interested in leveraging the KG information to learn classification tools, but also to help tackle knowledge discovery tasks.

Work is underway to integrate the current version of rtraph with automatically learned visual and graph embedding features to tackle different tasks such as multi-task artwork attribute prediction, multi-modal retrieval and artwork captioning, which are attracting attention in this domain (e.g., [8, 10, 14]). To this end, the proposed graph encodes a valuable source of knowledge to develop more powerful learning models. Once stable, we will make rtraph publicly available to provide the pattern recognition and computer vision community with a good foundation for further research on automatic art analysis.


Gennaro Vessio acknowledges funding support from the Italian Ministry of University and Research through the PON AIM 1852414 project.


  • [1] R. S. Arora and A. Elgammal (2012) Towards automated classification of fine-art painting style: a comparative study. In ICPR, pp. 3541–3544. Cited by: §2.
  • [2] A. Belhi, A. Bouras, and S. Foufou (2018) Leveraging known data for missing label prediction in cultural heritage context. Applied Sciences 8 (10), pp. 1768. Cited by: §1.
  • [3] G. Carneiro, N. P. da Silva, A. Del Bue, and J. P. Costeira (2012) Artistic image classification: an analysis on the PRINTART database. In ECCV, pp. 143–157. Cited by: §2.
  • [4] G. Castellano, E. Lella, and G. Vessio (2021) Visual link retrieval and knowledge discovery in painting datasets. Multimedia Tools and Applications 80 (5), pp. 6599–6616. Cited by: §2.
  • [5] G. Castellano and G. Vessio (2021) Deep learning approaches to pattern extraction and recognition in paintings and drawings: an overview. Neural Computing and Applications (), pp. . Cited by: §1.
  • [6] E. Cetinic, T. Lipic, and S. Grgic (2018) Fine-tuning convolutional neural networks for fine art classification. Expert Systems with Applications 114, pp. 107–118. Cited by: §1.
  • [7] E. Cetinic, T. Lipic, and S. Grgic (2019) A deep learning perspective on beauty, sentiment, and remembrance of art. IEEE Access 7, pp. 73694–73710. Cited by: §2.
  • [8] E. Cetinic (2021)

    Iconographic image captioning for artworks

    In ICPR Workshops and Challenges, , Vol. , pp. . Note: Cited by: §5.
  • [9] L. Chen and J. Yang (2019) Recognizing the style of visual arts via adaptive cross-layer correlation. In ACM MM, pp. 2459–2467. Cited by: §2.
  • [10] M. Cornia, M. Stefanini, L. Baraldi, M. Corsini, and R. Cucchiara (2020) Explaining digital humanities by aligning images and textual descriptions. Pattern Recognition Letters 129, pp. 166–172. Cited by: §1, §5.
  • [11] E. J. Crowley and A. Zisserman (2014) In search of art. In ECCV, pp. 54–70. Cited by: §1.
  • [12] E. J. Crowley and A. Zisserman (2016) The art of detection. In ECCV, pp. 721–737. Cited by: §2.
  • [13] L. Ehrlinger and W. Wöß (2016) Towards a definition of knowledge graphs. SEMANTiCS (Posters, Demos, SuCCESS) 48, pp. 1–4. Cited by: §3.
  • [14] N. Garcia, B. Renoust, and Y. Nakashima (2020) ContextNet: representation and exploration for painting classification and retrieval in context. International Journal of Multimedia Information Retrieval 9 (1), pp. 17–30. Cited by: §1, §2, Table 1, §5.
  • [15] N. Garcia and G. Vogiatzis (2018) How to read paintings: semantic art understanding with multi-modal retrieval. In ECCV, pp. . Cited by: §1, §2.
  • [16] N. Gonthier, Y. Gousseau, S. Ladjal, and O. Bonfait (2018) Weakly supervised object detection in artworks. In ECCV, pp. . Cited by: §2.
  • [17] A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In ACM SIGKDD, pp. 855–864. Cited by: §2.
  • [18] P. Hall, H. Cai, Q. Wu, and T. Corradi (2015) Cross-depiction problem: recognition and synthesis of photographs and artwork. Computational Visual Media 1 (2), pp. 91–103. Cited by: §2.
  • [19] S. Karayev, M. Trentacoste, H. Han, A. Agarwala, T. Darrell, A. Hertzmann, and H. Winnemoeller (2014) Recognizing image style. In BMVC, Cited by: §1, §2.
  • [20] F. S. Khan, S. Beigpour, J. Van de Weijer, and M. Felsberg (2014) Painting-91: a large scale database for computational painting categorization. Machine Vision and Applications 25 (6), pp. 1385–1397. Cited by: §2.
  • [21] G. Mercuriali (2019) Digital art history and the computational imagination. International Journal for Digital Art History: Issue 3, 2018: Digital Space and Architecture 3, pp. 141. Cited by: §5.
  • [22] H. Paulheim (2017) Knowledge graph refinement: a survey of approaches and evaluation methods. Semantic web 8 (3), pp. 489–508. Cited by: §3.
  • [23] B. Saleh, K. Abe, R. S. Arora, and A. Elgammal (2016) Toward automated discovery of artistic influence. Multimedia Tools and Applications 75 (7), pp. 3565–3591. Cited by: §2.
  • [24] B. Seguin, C. Striolo, F. Kaplan, et al. (2016) Visual link retrieval in a database of paintings. In ECCV, pp. 753–767. Cited by: §2.
  • [25] X. Shen, A. A. Efros, and A. Mathieu (2019) Discovering visual patterns in art collections with spatially-consistent feature learning. ICPR. Cited by: §1, §2.
  • [26] M. Stefanini, M. Cornia, L. Baraldi, M. Corsini, and R. Cucchiara (2019) Artpedia: a new visual-semantic dataset with visual and contextual sentences in the artistic domain. In ICIAP, pp. 729–740. Cited by: §2.
  • [27] G. Strezoski and M. Worring (2017) OmniArt: multi-task deep learning for artistic data analysis. arXiv preprint arXiv:1708.00684. Cited by: §1, §2.
  • [28] N. Van Noord, E. Hendriks, and E. Postma (2015) Toward discovery of the artist’s style: learning to recognize artists by their artworks. IEEE Signal Processing Magazine 32 (4), pp. 46–54. Cited by: §1, §2.