The last few years have seen a growing interest from the pattern recognition and computer vision community towards the development of automatic tools to support the analysis of visual arts. Some successful methods have already been proposed to tackle tasks related to art analysis, from time period estimation, e.g.[2, 27], to style classification, e.g. [6, 28]. This interest has been mainly increased by the availability of large digitized artwork collections, such as WikiArt,111https://www.wikiart.org which have provided training sets for many automatic art analysis systems. A deeper understanding of visual arts has the potential to make them accessible to a wider population, both in terms of fruition and creation, thus supporting the spread of culture .
Most of the work in the literature relies solely on pixel information inherent in digitized paintings and drawings that is suitable for being fed into Convolutional Neural Network (CNN) models for solving classification and retrieval tasks, e.g.[11, 19, 25]
. Some other works integrate this information with textual metadata or comments, so that computer vision systems are integrated with natural language processing techniques to address multi-modal learning problems, e.g.[10, 14, 15]
. This means that the information exploited is often just the visual features extracted from the digitized artworks. Alternatively, these features are used in conjunction with textual features to create a shared embedding where the two representations are projected and compared. These approaches lead to ignoring a large amount of domain knowledge as well as already known relationships and connections that could increase the quality of existing solutions. Instead, having a knowledge base in which not only artworks but also a rich plethora of metadata, contextual information, textual descriptions, etc., are unified in a structured framework can provide a valuable resource for more powerful information retrieval and knowledge discovery tools in the artistic domain. Such a framework would be beneficial not only for enthusiastic users, who can take advantage of the encoded information to navigate the knowledge base, but also especially for art experts, interested in finding new relationships between artists and/or artworks for a better understanding of the past and modern art.
To fill this gap, in this paper we present our ongoing work on the development of rtraph: an artistic knowledge graph (KG). The proposed KG integrates information collected from WikiArt and DBpedia, and exploits the potential of the Neo4j database management system, which provides an expressive modeling and graph query language. An overview is provided in Fig. 1.
2 Related Work
Traditionally, automatic art analysis has been performed using hand-crafted features fed into traditional machine learning algorithms, e.g.[1, 3, 20]. Unfortunately, despite the encouraging results of feature engineering techniques, early attempts soon stalled due to the difficulty of gaining explicit knowledge about the attributes to be associated with a particular artist or artwork. This difficulty arises because this knowledge typically depends on an implicit and subjective experience that a human expert might find difficult to verbalize [7, 23].
In contrast, several successful applications in a range of computer vision tasks have demonstrated the effectiveness of representation learning versus feature engineering techniques in extracting meaningful patterns from complex raw data. One of the first successful attempts to apply deep neural networks in this context was the research presented by Karayev et al.
, which shows how a pre-trained CNN can be quite effective in attributing the correct school of painting to an artwork. Since then, many works have focused on the use of deep learning techniques based on single-input[9, 28] or multi-input models  to solve artwork attribute prediction tasks based on visual features. Other directions that have attracted the interest of the community working on this domain are visual link retrieval [4, 24], object detection [12, 16, 18], and near-duplicate detection .
In recent times, a research direction that has sparked increasing interest is the one which combines computer vision with natural language processing techniques to provide a unified framework for solving multi-modal retrieval tasks. In this view, the system is asked to find an artwork based on textual comments describing it, and vice versa. The first corpus that provides not only artwork images and their attributes, but also artistic comments intended to achieve semantic art understanding is the SemArt dataset . Garcia and Vogiatzis have proposed several models that basically share the same scheme: first, images, descriptions and metadata attributes are encoded in visual and textual embeddings, respectively; then, a multi-modal model is applied to map these embeddings into a common space where a similarity function is used. In , Stefanini et al. promoted research in this domain by extending the task of visual-semantic retrieval to a setting where the textual domain does not exclusively contain visual sentences, i.e. those describing the visual content of the work, but also contextual sentences, which describe the historical context of the work, its author, the place where the artwork is located, and so on. To address this two-challenge task, the authors proposed the Artpedia dataset, on which they experimented with a multi-modal retrieval model that jointly associates visual and textual elements, and discriminates between visual and contextual sentences. Although fascinating, unfortunately Artpedia has a relatively small number of artworks, which are around .
Our work is inspired by research conducted by Garcia et al. . They combined a multi-output model trained to solve attribute prediction tasks based on visual features with a second model based on non-visual information extracted from artistic metadata encoded using a KG. This model was intended to inject “context” information to improve the performance of the first model. The general framework was called ContextNet
. To encode the KG information into a vector representation, the popular node2vec model was adopted. The KG was built using only the information provided by SemArt. To do this, the authors defined a node for each artwork and connected each artwork to its attributes. They used some metadata, including author, title, technique, etc. Also, by applying an -gram model to the title, its keywords were extracted and added to the graph. Metadata are only available for artworks in the dataset, so adding a new artwork would not result in any domain information about it. In addition, the proposed graph have the author node, which allows one to connect artworks with the same author, but without considering the relationships between authors, such as artistic influence. These two limitations can be overcome by relying on a source of knowledge external to the dataset, such as Wikipedia, which provides an enormous amount of information, even in a structured form. Our work is framed in this direction. Furthermore, we do not treat the KG just as an adjacency matrix from which embeddings can be extracted as auxiliary information to be provided to learning models. Instead, we encode the KG into a NoSQL database, namely Neo4j, which already helps provide a powerful knowledge discovery framework without explicitly training a learning system.
Knowledge graphs have emerged as a fascinating abstraction for organizing structured knowledge and as a way to integrate information extracted from multiple data sources. KGs have also begun to gain increasing popularity in machine (deep) learning as a method of incorporating world knowledge, as a representation of extracted knowledge, and for explaining what is being learned. There is no commonly accepted definition of KG . Any representation of knowledge of real world entities and relationships, but structured like a graph, can be understood as a KG . Formally, a KG can be expressed as , where is the set of entities, is the set of relationships, and is the set of facts. Each fact is a triple , with and .
The labeled graph representation of a KG can be used in various ways depending on the specific application. For example, if the nodes represent people, the edges can capture family relationships between them. As mentioned above, most of the art analysis methods proposed so far have focused only on visual features. Artworks, however, cannot be studied based only on their visual appearance, but also considering various other historical, social and contextual factors that allow them to be framed within a more complex framework. A comprehensive KG would provide a more expressive and flexible representation to incorporate relationships of arbitrary complexity between entities related to art, which cannot be obtained by considering only the visual content.
In this view, we developed rtraph as a KG in the art domain capable of representing and describing concepts related to artworks. Our KG can represent a wide range of relationships, including those between artists and their works. A comparison between our proposed KG and the one presented by Garcia et al. is provided in Table 1. It is worth noting that, at the current stage of our research, we are focusing only on (the most popular) artists, as we are interested in a richer representation of the relationships between them and other entities.
The core nodes of rtraph are authors and artworks. Metadata extracted from WikiArt have been transformed into relationships and nodes mainly related to the artworks, their genre, style, location, etc. Furthermore, since WikiArt does not provide rich information about authors, each author of our KG is connected not only to the artworks produced but also to other nodes built using RDF triples extracted from DBpedia. Extracting and integrating data from these two sources required a laborious process of data cleaning and normalization, as well as some manual intervention to resolve several inconsistencies between the data.
Overall, the conceptual scheme of rtraph (represented in Fig. 2) includes artwork nodes and artist nodes:
Each artwork node is connected to the following nodes: tags (e.g., woman, sea, birds), genre (e.g., self-portrait), style, period, series (e.g., “The Seasons” by Giuseppe Arcimboldo), auction, media (e.g., paper, watercolor), the gallery in which the artwork is located, and the city (or country) in which the artwork has been completed.
Each artist node is connected to the following nodes: field (e.g., drawing, sculpture), movement (e.g., Surrealism, Renaissance, Pop Art), training (e.g., Accademia di Belle Arti di Firenze), Wikipedia categories (e.g., living people, people from Florence), other artists (influences or teaching, and patrons).
This structure allows the creation of a network between artists, which is useful for further analysis. In total, the resulting KG contains nodes and edges, with artists, artworks, and a huge plethora of metadata and textual comments describing them (Table 1).
|KG||# nodes||# edges||# authors||# artworks||
This is a work in progress and we are aware of some limitations. Not all artworks have a textual description and we are finding additional sources of knowledge to overcome this limitation. Furthermore, there is no direct relationship between an artist and a city/country, so there is no structured geographic information about the artists.
4 Implementation and Some Applications
rtraph has been implemented in Neo4j222https://neo4j.com on an i5-10400 system, with a 2.90 GHz CPU and 16GB of RAM. We preferred Neo4j to other existing solutions as it is a native graph database that provides a powerful and flexible framework for storing and querying graph-like structures. Using Neo4j, connections between data are stored and not calculated at query time. Cypher, which is the declarative query language adopted by Neo4j, takes advantage of these stored connections to provide an expressive and optimized language for graphs to execute even complex queries extremely quickly.
The developed web interface can also show the results of some queries which can be particularly useful for art analysis, such as: retrieving the direct and indirect influencing connection between artists with different degrees of separation; identifying artworks that are stored in a country other than those in which they were completed; retrieving all the works that are are kept in a particular place (Fig. 4). On the tested platform, each query takes about a few tens of milliseconds. The ability to query the graph database already provides information retrieval and knowledge discovery capabilities in the art domain without having to train a learning system.
5 Conclusions and Future Work
In this paper, we have presented rtraph, an artistic knowledge graph primarily intended to provide art historians with a rich and easy-to-use tool to perform art analysis. This effort can foster the dialogue between computer scientists and humanists that is currently sometimes lacking . Indeed, contrary to other works, we are not only interested in leveraging the KG information to learn classification tools, but also to help tackle knowledge discovery tasks.
Work is underway to integrate the current version of rtraph with automatically learned visual and graph embedding features to tackle different tasks such as multi-task artwork attribute prediction, multi-modal retrieval and artwork captioning, which are attracting attention in this domain (e.g., [8, 10, 14]). To this end, the proposed graph encodes a valuable source of knowledge to develop more powerful learning models. Once stable, we will make rtraph publicly available to provide the pattern recognition and computer vision community with a good foundation for further research on automatic art analysis.
Gennaro Vessio acknowledges funding support from the Italian Ministry of University and Research through the PON AIM 1852414 project.
-  (2012) Towards automated classification of fine-art painting style: a comparative study. In ICPR, pp. 3541–3544. Cited by: §2.
-  (2018) Leveraging known data for missing label prediction in cultural heritage context. Applied Sciences 8 (10), pp. 1768. Cited by: §1.
-  (2012) Artistic image classification: an analysis on the PRINTART database. In ECCV, pp. 143–157. Cited by: §2.
-  (2021) Visual link retrieval and knowledge discovery in painting datasets. Multimedia Tools and Applications 80 (5), pp. 6599–6616. Cited by: §2.
-  (2021) Deep learning approaches to pattern extraction and recognition in paintings and drawings: an overview. Neural Computing and Applications (), pp. . Cited by: §1.
-  (2018) Fine-tuning convolutional neural networks for fine art classification. Expert Systems with Applications 114, pp. 107–118. Cited by: §1.
-  (2019) A deep learning perspective on beauty, sentiment, and remembrance of art. IEEE Access 7, pp. 73694–73710. Cited by: §2.
Iconographic image captioning for artworks. In ICPR Workshops and Challenges, , Vol. , pp. . Note: Cited by: §5.
-  (2019) Recognizing the style of visual arts via adaptive cross-layer correlation. In ACM MM, pp. 2459–2467. Cited by: §2.
-  (2020) Explaining digital humanities by aligning images and textual descriptions. Pattern Recognition Letters 129, pp. 166–172. Cited by: §1, §5.
-  (2014) In search of art. In ECCV, pp. 54–70. Cited by: §1.
-  (2016) The art of detection. In ECCV, pp. 721–737. Cited by: §2.
-  (2016) Towards a definition of knowledge graphs. SEMANTiCS (Posters, Demos, SuCCESS) 48, pp. 1–4. Cited by: §3.
-  (2020) ContextNet: representation and exploration for painting classification and retrieval in context. International Journal of Multimedia Information Retrieval 9 (1), pp. 17–30. Cited by: §1, §2, Table 1, §5.
-  (2018) How to read paintings: semantic art understanding with multi-modal retrieval. In ECCV, pp. . Cited by: §1, §2.
-  (2018) Weakly supervised object detection in artworks. In ECCV, pp. . Cited by: §2.
-  (2016) Node2vec: scalable feature learning for networks. In ACM SIGKDD, pp. 855–864. Cited by: §2.
-  (2015) Cross-depiction problem: recognition and synthesis of photographs and artwork. Computational Visual Media 1 (2), pp. 91–103. Cited by: §2.
-  (2014) Recognizing image style. In BMVC, Cited by: §1, §2.
-  (2014) Painting-91: a large scale database for computational painting categorization. Machine Vision and Applications 25 (6), pp. 1385–1397. Cited by: §2.
-  (2019) Digital art history and the computational imagination. International Journal for Digital Art History: Issue 3, 2018: Digital Space and Architecture 3, pp. 141. Cited by: §5.
-  (2017) Knowledge graph refinement: a survey of approaches and evaluation methods. Semantic web 8 (3), pp. 489–508. Cited by: §3.
-  (2016) Toward automated discovery of artistic influence. Multimedia Tools and Applications 75 (7), pp. 3565–3591. Cited by: §2.
-  (2016) Visual link retrieval in a database of paintings. In ECCV, pp. 753–767. Cited by: §2.
-  (2019) Discovering visual patterns in art collections with spatially-consistent feature learning. ICPR. Cited by: §1, §2.
-  (2019) Artpedia: a new visual-semantic dataset with visual and contextual sentences in the artistic domain. In ICIAP, pp. 729–740. Cited by: §2.
-  (2017) OmniArt: multi-task deep learning for artistic data analysis. arXiv preprint arXiv:1708.00684. Cited by: §1, §2.
-  (2015) Toward discovery of the artist’s style: learning to recognize artists by their artworks. IEEE Signal Processing Magazine 32 (4), pp. 46–54. Cited by: §1, §2.