Use of OWL and Semantic Web Technologies at Pinterest

07/03/2019
by   Rafael S. Gonçalves, et al.
0

Pinterest is a popular Web application that has over 250 million active users. It is a visual discovery engine for finding ideas for recipes, fashion, weddings, home decoration, and much more. In the last year, the company adopted Semantic Web technologies to create a knowledge graph that aims to represent the vast amount of content and users on Pinterest, to help both content recommendation and ads targeting. In this paper, we present the engineering of an OWL ontology---the Pinterest Taxonomy---that forms the core of Pinterest's knowledge graph, the Pinterest Taste Graph. We describe modeling choices and enhancements to WebProtégé that we used for the creation of the ontology. In two months, eight Pinterest engineers, without prior experience of OWL and WebProtégé, revamped an existing taxonomy of noisy terms into an OWL ontology. We share our experience and present the key aspects of our work that we believe will be useful for others working in this area.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

07/17/2020

OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs

In recent years, Semantic Web technologies have been increasingly adopte...
12/31/2021

Towards a Domain Ontology for the Analysis of Ancient Fabrics The SILKNOW Project and the Case of European Silk Heritage

In this article, we present the SILKNOW project (Silk heritage in the Kn...
11/03/2021

Marriage is a Peach and a Chalice: Modelling Cultural Symbolism on the SemanticWeb

In this work, we fill the gap in the Semantic Web in the context of Cult...
04/05/2020

GIANT: Scalable Creation of a Web-scale Ontology

Understanding what online users may pay attention to is key to content r...
04/20/2013

A Markov Model for Ontology Alignment

The explosion of available data along with the need to integrate and uti...
06/25/2021

SeaNet – Towards A Knowledge Graph Based Autonomic Management of Software Defined Networks

Automatic network management driven by Artificial Intelligent technologi...
10/14/2017

E-learning Information Technology Based on an Ontology Driven Learning Engine

In the article, proposed is a new e-learning information technology base...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Pinterest 111https://www.pinterest.com was founded in 2010, and is headquartered in San Francisco, California. Pinterest offers a visual discovery engine that helps people find things that they like, which might be things they would like to do, such as scuba diving, places that they might like to visit, such as tropical islands, garments that they might like to wear, such as Bohemia dress, and so on. More specifically, Pinterest offers users a collection of digital pin-boards, or simply boards (Figure 1). Users, known as “Pinners”, save bookmarks for Web content, known as Pins, to boards. A Pin can be shared amongst boards and is visualized by an image that summarizes what the Pin represents. Clicking a Pin takes a user to the underlying Web page that hosts the image and related content.

Both Pins and Pinners are highly diverse and the amount of content is substantial—Pinterest hosts over 175 billion Pins and it has over 250 million monthly active users. To recommend the most relevant Pins to its users, and to achieve precise ads targeting, Pinterest defines a set of interests. These interests are simply terms that describe what each Pin/image on Pinterest is about, and what each user on Pinterest is interested in. The interests are organized in a hierarchical structure, called “the Pinterest Taxonomy”. Behind the scenes, Pinterest categorizes both Pins and Pinners into one or multiple interests. By knowing what a user is interested in and what each Pin is about in the same categorization space, it becomes easier to provide personalized recommendations. Advertisers can also use the Pinterest Taxonomy to create Ads campaigns on Pinterest by selecting interests from the taxonomy. The selected interests essentially identify groups of Pinterest users who will be targeted by the campaigns.

Figure 1: An example of a Pinner’s home feed. The feed displays examples of Pins that the Pinner has saved to their boards and suggested Pins that they might also be interested in. Pins are comprised of a representative image, a title and a snippet of text. Clicking on a Pin will take the user to the Web content that the Pin represents or bookmarks.

In this paper, we describe the engineering process behind the Pinterest Taxonomy, and we discuss key aspects of the work carried out by the Pinterest and Protégé teams that we believe are relevant and useful to others working in this area. We discuss the use of OWL to model the content in Pinterest, and the benefits of using OWL over previous spreadsheet-based representations. We describe the WebProtégé collaborative editing environment that was used to create, maintain, and evolve the Pinterest Taxonomy, and we document the extensions to WebProtégé that we implemented to optimize the Pinterest taxonomy construction workflow at Pinterest.

2 Nomenclature

In what follows we define the nomenclature that we use throughout the rest of the paper. Figure 2 shows a schematic of the Pinterest nomenclature and represents an abstract view of the “Pinterest Taste Graph”.

Pin

An image, or visual bookmark on Pinterest that includes a description and links to an external URL.

Pinner

A Pinterest user who creates and/or saves Pins to their boards.

Interest

A concept that denotes what a Pin is about, or what a user is interested in. The Interest can be simply an answer to the question “What are you interested in?”, which could be “Photography”, “Cooking”, etc. Interests can be very broad, for example, “Event Planning” or “Food and Drink”, or specific, for example, “Scuba Diving”, or very specific, for example, “DIY Pom Pom”.

Board

A collection where Pinners organize their Pins in context as they plan. For example, a Pinner could create a board called Italian Recipes and add “Pizza Recipe”, “Home Made Pesto Sauce” and “Veggie Lasagna” Pins to it.

Taxonomy

Extended from the scientific definition [1], the Pinterest Taxonomy is a hierarchical arrangement of interests. In OWL, the taxonomy roughly corresponds to the class hierarchy of an ontology.

Vertical

A top level node in the Pinterest Taxonomy and its sub-trees. Examples of verticals are “Women’s Fashion”, “Home Decor” and “Architecture”. Figure 2 depicts a (very small) portion of the “Architecture” vertical.

Taste Graph

The Pinterest knowledge graph. The Taste Graph [6] is a graph formed by combining the Pinterest Taxonomy with nodes representing Pinners and Pins. Every Pinner node and every Pin node is associated with one or more Pinterest Taxonomy nodes.

WebProtégé

A collaborative cloud-based OWL ontology development environment. A WebProtégé user can log in to WebProtégé, create an ontology project, and then share the project with geographically distributed collaborators. Users see and discuss ontology changes in real-time.

Figure 2: A conceptual view of the Pinterest Taste Graph. A Pinner saves a Pin to one of their Boards. The Pin depicts/represents one of the Pinner’s interests. Here, the Pinner is interested in “Mid Century Architecture”, which is an interest in “Architecture”. Architecture is a top-level interest, which is known as a Vertical. The hierarchy of interests is known as the Pinterest Taxonomy, or “the taxonomy” for short. (Note that the relationship of the Pin to the board is not shown here.)

3 From Pins, to Interests, to a Taxonomy, to an Ontology

Pinterest has come a long way in terms of understanding the Pins and users, and uses those to power its business. The company started out by keyword based understanding: extracting keywords from each Pin, doing clean ups, canonicalization, and labeling the Pins with these keywords. After that, based on users’ engagement with the Pins, the users were labeled with some keywords too. Then Pinterest leveraged both the labels on the Pins and on the users to do recommendation and ads targeting.

Two years ago, Pinterest decided to enable an “interest” based ads targeting interface, which named the most popular keywords as interests, and organized them in a tree structure (a taxonomy) for advertisers to pick nodes from for their campaigns. For example, if Home Depot creates a campaign for the interest “Living Room”, both the users interested in “Living Room” in general, as well as users interested in “Sofas”, “TV Stands”, “French Living Room Style”, “Living Room Decor”, etc., will all be exposed to the ads of this campaign. Having realized the potential that such a structured knowledge representation provides, Pinterest decided to investigate ways to (1) improve the quality of the taxonomy; (2) augment the scope of the taxonomy as needed to provide coverage for all Pinterest content; and (3) settle on principled and robust processes by which the taxonomy is constructed, maintained, and reviewed.

Given the wide diversity and the (sometimes highly) specific nature of the Pins hosted on Pinterest, existing public taxonomies such as the Google Product Taxonomy222 https://www.google.com/basepages/producttype/taxonomy.en-AU.txt are insufficient to fully describe Pinterest content. Thus, Pinterest decided to develop their own taxonomy, targeted at the kind of content that it hosts, and focusing on a specific business use-case: ads targeting. In the first version of the Pinterest Taxonomy, Pinterest had manually defined and organized 400 interests in a 2 level hierarchy, most of which were very broad. Subsequently, Pinterest released a far more granular version, with 6,000 interests organized in a 3 level hierarchy. This version was entirely curated in a spreadsheet that was generated based on terms from top queries in Pinterest. One year later, based on feedback from advertisers, Pinterest decided to improve the taxonomy in terms of the quality and quantity of the interests in it. At that point, the Pinterest and Protégé teams begun working on an OWL version of the Pinterest Taxonomy.

Pinterest started out editing the Pinterest Taxonomy in spreadsheets. Soon after, the Pinterest Content team realized that it was difficult to visualize, keep track of changes, and associate interests with metadata. Thus, Pinterest decided to adopt a standard knowledge representation language and more appropriate tooling for collaborative taxonomy editing. Pinterest chose to use the Web Ontology Language (OWL) [2] to model their taxonomy, and WebProtégé [4] as the collaborative development environment.

Since the adoption of OWL and WebProtégé, Pinterest has greatly shortened the development cycle of the Pinterest Taxonomy. The new tooling has made it possible for Pinterest to build an end-to-end system in just two months. In this time span, Pinterest designed an OWL ontology in WebProtégé; built guidelines and workflows for human curation using WebProtégé; loaded the 6,000 interest taxonomy into WebProtégé; enriched it with 5,000 new interests extracted from user provided content; cleaned it up and re-organized it; and developed an engineering pipeline to consume the ontology, and use its content to populate a relational database for internal product consumption.

4 Key Requirements

The insights that Pinterest drew from building the initial versions of the Pinterest Taxonomy, and from advertisers’ feedback, have provided us with concrete business requirements for this project. We list and discuss these requirements below. Subsequently, we enumerate the tooling requirements needed to fulfil the business requirements.

4.0.1 Business Requirements

The Pinterest Taxonomy will be used internally to categorize all the Pins and all the users, and externally to power Pinterest’s ads targeting. To fulfill both of these use cases, the final knowledge representation needs to:

  1. Be a single root tree structure, instead of a directed acyclic graph (DAG) for near-term downstream consumption. It should however be possible to evolve the taxonomy into a multi-parent DAG;

  2. Provide support for adding attributes (facets) to the interests, in order to support multiple perspectives of the categorization and poly-axial classifications;

  3. Match the Pinterest content, that is, it should include interests that depict a substantial number of Pins and exclude interests that Pinterest has little or no Pins for;

  4. Contain no ambiguous interests—the interests’ names should be clear on their own even after removing the context provided by the tree structure (i.e., the parents in the hierarchy), for example, “cricket” the insect versus the sport;

  5. All children of the same parent should be mutually exclusive and collectively exhaustive (MECE); and

  6. Quality of the interests is more important than quantity.

4.0.2 Tooling Requirements

To manage, curate and evolve the Pinterest Taxonomy, Pinterest needs a tool that provides:

  1. The ability for multiple editors located in different geographical locations to work on the same project simultaneously;

  2. A way to track which interests in the taxonomy have been reviewed;

  3. The ability to efficiently reorganize the taxonomy—move an interest to a different branch, merge, rename or deprecate it;

  4. A way to add annotations/metadata to one or multiple interests, e,g., sample Pins, a short description of the interest, synonyms, statistics and attributes of this interest;

  5. Multi-lingual support, that is, support for adding labels in multiple languages and for displaying the full taxonomy in different languages; and

  6. A friendly user interface (UI) to allow people to browse and search for interests in the taxonomy based on their annotations. The UI should provide the ability to share links directly to content in the taxonomy.

5 Ontology Modeling Experiments

We conducted several ontology modeling experiments. The goals of these experiments were: (1) To determine the kind of vocabulary defined in Section 2 that we would need for the project; (2) To settle on (best practice) conventions, for example, rules for consistent naming of the interests; (3) To experiment with editing workflows and define the curation instructions; (4) To evaluate WebProtégé as a tool to satisfy editing requirements; and (5) To see what gaps needed to be filled in terms of tooling.

5.1 Deriving a Seed Ontology

We used the 6,000 interests from the three-level taxonomy to bootstrap a starting ontology. There were several problems with the ontology output from this process:

Lack of Coverage

The interests were not enough to describe all of Pinterest’s content. For example, it had 120 interests for “Men’s Fashion”, over 400 interests for “Women’s Fashion”, but no interests at all for “Children’s Fashion”.

Imbalanced Structure

The taxonomy was very broad and shallow. Moreover, it was inappropriately imbalanced with respect to the number of children per parent. For example, one vertical had only 2 child interests while another had over 80. This may feel odd to advertisers and may be harder to find the relevant interests.

Irregular Precision

Some areas of the taxonomy were too fine-grained. For example, the part of the taxonomy representing the “Art” vertical contained many interests of the form “11 x 17 posters” or “36 x 48 posters” (representing interests in very specific poster sizes).

Irregular Naming

The naming convention for the terms in the taxonomy was not uniform. Some terms were named in singular form while their siblings were named in plural form. There was also an inconsistent use of prefixes and suffixes.

5.2 Detailed Modeling Pilot Study

Having identified initial problems with the seed ontology, we honed in on two verticals, “Home Decor” and “Fashion”, in order to focus on more detailed modeling issues and to get a better feeling for development environment issues. We chose these verticals for the richness and variety of interests contained in them (to expose modeling issues) and for their prominence on Pinterest. We used seed lists of interests to start constructing ontologies representing these verticals.

5.2.1 Development Tools

During the pilot study phase, we used a number of tools for engineering and communication that we describe next.

Collaborative Editing Environment

We used WebProtégé and its collaboration features throughout the pilot experiment. We made use of threaded discussions to document editorial decisions and point to external references that were considered. We made heavy use of email and Slack notifications, which enabled timely responses to discussion within WebProtégé. We used the change tracking feature of WebProtégé to review recent changes when starting a modelling session, and we used the “live project feed” to monitor current activity.

Communication Tools

We used Slack for communication outside WebProtégé. Both the Pinterest and Protégé teams were already familiar with this tool for internal communication. Because WebProtégé supports “deep linking”, it was easy to paste links to entities in WebProtégé directly into Slack. We used Slack for any discussions unrelated to the ontology content, for example, to set up meetings, to discuss tooling matters, and to report and discuss software issues. We also held teleconferences on a regular basis.

In-person Meetings

We met face-to-face for extended periods of time at the start of the project, in order to make major decisions on workflow and tooling. We held a workshop meeting early in the pilot experiment to simultaneously work on the “Fashion” vertical in the same room. This enabled us to quickly assess usability and tooling issues, and it also helped us to quickly identify and discuss broad modeling issues.

5.2.2 Design Decisions and Modeling Choices

The modeling pilot study enabled us to settle on various modeling choices and engineering conventions:

Interests as Classes

We decided to represent interests as classes. This may seem odd, but ontologically, an instance of an interest represents someone’s (a Pinner’s) own particular, unique interest in something. Thus, classes in the ontology represent interests and not the actual subject of an interest. From herein we use interests and classes interchangeably.

Using classes for interests also side-steps the thorny issue of classes versus individuals. Suppose someone is interested in San Francisco. Ontologically, San Francisco, the place, is an individual. That is, there is just one San Francisco in the domain of discourse (the world). In the taxonomy, we do not explicate the fact that an interest in San Francisco represents an interest about San Francisco the place. While this may seem straightforward, the water is much more muddy when one thinks about things like recipes, computer games or cute videos of cats. Thus, focusing purely on interests as classes helps to keep the modelling clean and simple, and helps to avoid overly complex debates about modeling.

One more important benefit of modeling interests as classes is that interests can be easily specialized. This includes obvious cases of specialization, such as an interest in mid-century architecture is an interest in architecture. It also includes less obvious specializations, such as an interest in 1960’s San Francisco is an interest in San Francisco.

Interest Descriptions

For each interest, we added a label (its preferred name), plus synonyms (if available) and definitions (where warranted). We recorded this information using the following annotation properties:

rdfs:label

as the primary, preferred name/label for an interest in a given language.333Note that we could have chosen skos:prefLabel. We ensure that all labels are unique, so skos:prefLabel annotations could easily be generated from rdfs:label values. Every label has a language tag, for example, @en.

skos:altLabel

for recording synonyms. We encoded all known synonyms of each interest to support our search and presentation goals.

skos:definition

to include a 1-2 sentence textual definition to clarify the meaning of an interest. This is important, as it provides a shared understanding among the team of what this interest is, especially when it is not a well-known term. Many of the definitions were copied from Wikipedia.

Domain specific annotation properties

for both business usage and curation usage. For example, there are certain interests that are sensitive or brand related, and thus cannot (according to Pinterest policy) be exposed to advertisers. These interests are marked as noAds=true in order to identify them in the engineering pipeline and avoid exposing them in the targeting interface. We also used defined properties such as isHumanReviewed, to indicate whether an interest has been human reviewed or not, and to eliminate the possibility of curators reviewing previously reviewed interests.

Naming Conventions

We used title case names, with spaces, for interest names for example, “Garden Bench”. Ontology engineering recommendations often state that singular noun forms should be used for entity names [7, 8, 10, 11, 12]. However, we determined that it was necessary, and more natural, to use a mix of singular and plural forms based on the forms used in Pinterest’s top queries.

We attempted to normalize the names of interests in a principled way. For example, under “Home Decor Styles” there are a large number of styles. Many of these were not uniformly named. Some were named ending with “Style” or “Styles” (for example, “California Style”), others were named ending in “Interior” or “Interiors” (for example, “Art Deco Interiors”), while others were named ending in “Decor” (for example, “Bohemian Decor”). In these cases, we settled on particular patterns, depending upon the context, and then normalized interests according to these patterns. Whenever we renamed a topic we endeavored to preserve the original name in a skos:altLabel annotation to keep the old name as a synonym.

Name Ambiguity

Some topic names in the original Pinterest taxonomy were used in different senses. For example, “Topiary” is both the activity of sculpting plants into three-dimensional shapes, and plants themselves that have been sculpted this way. We disambiguated them in the way that thesauri entries are disambiguated, for example, “Topiary (Plant)” and “Topiary (Gardening Activity)”.

Interest Disambiguation

We frequently had to use the Pinterest text-search functionality to disambiguate obscure names. We struggled with ambiguous names like “privacy screen”, “water scooper”, “valances”, among others. We made extensive use of Wikipedia for providing concise textual definitions for interests to aid curators during the review process. With an eye to advertising—one of the most important uses of the taxonomy—we also viewed external Web sites, such as Bed, Bath and Beyond, Crate & Barrel, IKEA and Walmart in order to compare their product categorization with the kinds of products that are represented by interests.

6 Production Tooling

Given the working relationship between Pinterest and Stanford, WebProtégé was the obvious choice for an editing environment. The pilot experiments that we carried out revealed that WebProtégé was able to meet the majority of the tooling requirements. However, we significantly enhanced WebProtégé to satisfy previously unmet requirements and, in places, to streamline existing cumbersome editing operations. In what follows, we describe the key aspects of WebProtégé in the context of the tool requirements for this project.

Figure 3: The WebProtégé UI. The figure shows the main WebProtégé user interface displaying the ‘Aero’ project. The left hand side displays the class hierarchy, containing various “tagged” classes. The center displays content for the selected class, in this case L1011. This center panel can be switched to display the history for the selected class. The right hand side displays threaded discussions for the selected class and recent editing and commenting activity. (Note that, this project bears no relation to the Pinterest project. It is used here merely for illustration purposes.)
Support for Multi-User Collaboration and Sharing

WebProtégé is a cloud-based OWL ontology development environment where users perform editing and viewing tasks in a Web browser. It allows geographically dispersed users to collaborate in real-time.

When a change is made to an ontology, all collaborators see the change in real time. Changes are also tracked, in the form of axiom additions and removals. Information about a change is captured along with metadata about who performed the change and when the change was performed. The complete project change history is available for collaborators to peruse. Pinterest uses the change history to keep track of contributions by curators, and to carry out downstream analyses.

Crucially, WebProtégé users can be assigned different roles within the context of a project. This makes it possible to cater for workflows that comprise multiple teams with different responsibilities. In the case of the Pinterest project, there is a core team of editors surrounded by a larger group of reviewers/commenters. While the “under the hood” role management capabilities provided by WebProtégé are very fine-grained, the default user-interface supports high level coarse-grained privileges, namely “Manage”, “Edit”, “Comment” and “View”. At the start of the project, it was not clear to us whether these privileges would be sufficient. However, as editing and refinement of the ontology proceeded, it became clear that these basic permissions worked well enough.

Provision of a User Friendly Interface

WebProtégé provides editing support for the complete OWL 2 syntax. However, by default, WebProtégé displays a simplified editing interface (Figure 3) that we believe is sufficient for the majority of ontology projects [3].

This interface allows users to edit ontologies in a frame based style, specifying parents (rdfs:subClassOf under the hood), annotations, and relationships (rdfs:subClassOf with super classes that match specific patterns of class expressions under the hood) in an intuitive frame-like way. An image of this default user interface is displayed in Figure 3.

So far, this simple interface has worked well for the Pinterest project, with the bulk of editing being annotation based editing and hierarchy editing. Pinterest ontology curators required little training in how to use the interface—they attended a one hour training session on the WebProtégé interface and ontology editing conventions.

Support for Ontology Reorganization

As mentioned previously, the initial input for the Pinterest ontology was a spreadsheet that had been derived from user data. This provided a seed taxonomy with at most three levels of depth and mixed bags of non-unique terms at each level. One of the perceived benefits of moving to OWL and using WebProtégé was that the ontology would be easier to browse and edit compared to the existing spreadsheet based approach. While this has largely proven to be true, we had to extend WebProtégé with two new features for streamlining the editing workflow.

The first feature that we added was a workflow to merge entities. This allows multiple entities to be selected and then merged into a target entity. This operation performs a number of complex steps under the hood, such as replacing references to the entities being merged with a reference to the target entity. It leaves merged entity IRIs intact, but deprecates them and preserves annotations on them, for record keeping purposes.

The second feature that we added was a bulk move operation. The initial cleanup step involved a significant amount of re-organizing edits to be performed, sometimes between large disparate branches of the taxonomy hierarchy. The new bulk move feature allows multiple entities to be selected and in the next step a new parent entity to be chosen for them. While simple, this feature proved to be much more effective and more reliable than using drag and drop.

Finally, both merge operations and move operations typically involve multiple atomic changes to achieve the desired outcome. WebProtégé bundles up these atomic changes into a single composite change operation, which can then be applied with a manually entered commit message that appears in the change history log of the project.

Figure 4: Comments management and notifications. The main part of the figure shows the Comments tab. Comments can be sorted and viewed by entity. The lower left-hand side inset shows a notification email sent out to participants after a comment has been posted. The right-hand side inset shows integration with the chat app Slack.
Support for Metadata Editing

Interest classes in the ontology are richly described with entity annotations. These annotations can be roughly split into two types: (1) Content description annotations, which provide synonyms, descriptions, visibility flags, and pointers to example Pins; and (2) Status annotations, which provide “housekeeping” information about the editorial status of classes. In both cases we quickly learnt that there was a reoccurring desire to apply edits to a large number of annotations at once. From switching a status flag from ‘false’ to ‘true’, to applying a consistent naming convention based on regular expressions. We therefore added a powerful bulk annotation editing interface to WebProtégé. This feature allows entity annotations to be “selected“, using patterns, and then modified, deleted or augmented based upon the criteria used to select them.

Support for Reviewing and Quality Control

Besides offering real-time distributed editing capabilities, WebProtégé provides support for collaborative interaction in the form of “entity discussion threads” (Figure 3, right hand side). Discussion threads can contain user mentions, links to entities and links to external resources.

This discussion functionality has been used for a number of purposes in the Pinterest project, including for filing interest related issues and for soliciting reviews of interest descriptions. Issues were sorted and managed via the “Comments” tab (Figure 4), which provides basic functionality for sorting issues by creation or modification time and by entity, and largely proved sufficient for the task at hand.

Figure 5: Tags management in WebProtégé. Each tag is assigned a label, an optional description and a color. Tags can be defined on a per project basis. These tags can then either be manually assigned to entities or they can be assigned in an automated live query-based way.

When discussions are posted to a project, collaborators are notified via email or Slack444https://slack.com (Figure 4) so that they do not miss discussion posts that are relevant to them. Notification emails contain deep links to the interests being discussed so that it is possible to “jump” straight to the relevant portion of the ontology in WebProtégé. This proved to be an effective way of engaging members of the Pinterest knowledge management team in active discussion.

As part of the taxonomy quality control process, all classes representing interests require human review. Classes that require a review are flagged with annotations to indicate the review status. Early on in the project, Pinterest requested custom highlighting for classes, so that certain classes, for example those requiring a review, would stand out from other classes. To support this, we added “entity tags”555Recall that entities, in OWL, are classes, properties, individuals and datatypes. to WebProtégé.

Entity tags can be used to highlight entities in a colorful, enticing way in the WebProtégé user interface. Example entity tags are shown on the left hand side of Figure 3, where there are tags for flagging missing definitions and missing Hungarian labels, among others. Multiple tags can be specified for a given project, as shown in Figure 5, and multiple tags can be assigned to a single entity. Not only do tags “call out” entities in the user interface, they are also searcheable. Thus, it is possible to list and filter entities that have given tags.

We designed Entity Tags so that they could either be assigned to entities in a manual, explicit fashion, or assigned in an automated manner based on ontology content. Figure 6 shows the set up page for automated tag assignment. While a presentation of the full tagging capabilities is beyond the scope of this paper, it is possible to tag entities that match a given set of rules/criteria. This tagging feature supports complex, multiple conjunctive and disjunctive criteria, along with paths of values to be matched. Many types of matches are possible, such as matches by specific value, parts of values (regular expressions) and ranges. Furthermore, it is possible to use entity matching criteria to check constraints that involve multiple values, such as label uniqueness (in the context of a given language) and annotation value disjointness (to enforce rules such as preferred labels being disjoint from alternative labels).

Figure 6: Automated tag assignment. Tags can be automatically assigned to entities based upon ontology structural criteria. Here, the criteria specify that any descendants of “Airbus Aircraft” that are missing a value for skos:definition will be tagged with the “Missing Definition” tag.

It is worth noting that the tagging functionality, that is, the entity matching criteria coupled with automated tag assignments, comes close to a SHACL Core [5] implementation in a user-friendly guise.

Support for Multi-lingual Editing and Viewing

WebProtégé has always had support for specifying IETF language tags [9] via auto-completion in the default editing interface. However, it soon became clear that the Pinterest curators required more elaborate functionality for viewing and checking language tags.

We first added support for a project default language tag setting. The default language is added to the labelling annotation when creating new entities. This saves a lot of clicking and typing when creating new interests.

We made significant changes to the rendering mechanism in WebProtégé so that it is possible to specify a list of primary and secondary display languages. Secondary display languages are used to derive secondary display names for entities, which are displayed along-side the primary display names in the various hierarchies (Figure 7) and lists throughout WebProtégé. This provides a context and makes it easier for language specialists to perform translations.

Finally, we added rule templates, as part of the previously mentioned entity tagging functionality, so that it is possible to display colored indicators next to entities that are missing certain language tags (Figure 7).

7 Production Development

There were eight Pinterest people involved in the development of the ontology, most of whom are not engineers and did not have any knowledge about OWL ontologies or WebProtégé. The eight ontology curators had a one-hour training session on how to use WebProtégé, and they were able to start curating content right away. Overall, it took the curators less than one month to build and finalize the ontology. After this first round of curation, some partner teams such as Ads and Sales reviewed the ontology, provided feedback and suggestions for improvements, and they even edited the ontology themselves with minimum training.

Throughout the entire development process there were 2,000 comments spanning 1,000 discussion/issue threads on interests, within WebProtégé. Overall the ontology went through 38,000+ revisions before its current version (where each revision involves potentially multiple axiom changes). The final ontology has 11,000 classes (interests), 24 verticals (top-level interests), and up to 12 levels of depth in certain branches of the hierarchy. It contains 145,000 axioms, out of which 25,000 are logical axioms and 95,000 are annotation axioms.

To consume the production ontology, Pinterest engineers built a Python-based pipeline that processes it and generates relational database tables for existing internal applications and other Pinterest tooling pipelines.

Figure 7: An example of secondary language display names. The primary display language is “en” (English). The secondary display language is “hu” (Hungarian). The secondary display name is shown to the right of the primary display name. The colorful tag “hu” highlights classes that are missing the “hu” language tag.

8 Discussion

Throughout the development of the Pinterest Taxonomy, the Pinterest and Protégé teams encountered various challenges both in terms of modeling choices and tool support for ontology development. We describe some of these challenges below, and discuss how they will shape some future directions of our work.

Multiple Inheritance and Cross-Vertical Interests

Some interests would be best described with multiple parents. We frequently wanted to classify interests not only by their “primary type”, but also by their intended role, their location, and the material that they are made from. For example, an interest in “Bathroom Lighting” is an interest in “Bathroom Decor”. However, “Bathroom Lighting” could also be under “Light Fixtures”. Allowing multiple parents for “Bathroom Lighting” would let advertisers of both bathroom design and lighting stores target Pinners who are interested in “Bathroom Lighting”.

Multiple Relationship Types among Interests

Currently we only have one type of relationship between interest classes: is-a. For example, an interest in “Sandals” is an interest in “Shoes”. However, we see a need for more relationships, for example, “Thanksgiving Recipe” (a “Food and Drinks” interest) and “Thanksgiving Decoration” (a “Home Decor” interest), plus “DIY Thanksgiving Card” (a “DIY and Crafts” interest) are all related to the Thanksgiving interest, so a Pinner interested in one before Thanksgiving time would highly likely be interested in the other two.

New Global Classes and Association with Interests

Besides categorizing Pins and users against interests, we can also categorize them into Attributes. For example, “Colors”, “Brands” and “Materials”. Since Pinterest powers shopping as well, we can imagine understanding supply and demand from the Attribute perspective would give Pinners more relevant products. Attributes should be defined as global (cross-vertical) classes, and they should be associated with applicable interests via appropriate relationships.

Richer Axiomatization

The ontology has gone through over 38,000 revisions in WebProtégé, each composed of potentially multiple axiom changes. The logical axioms used in the ontology are SubClassOf axioms. The current logical expressivity of the Pinterest Taxonomy falls under all three OWL 2 profiles: EL, RL, and QL. These profiles benefit from desirable computational properties, such as polynomial time (or less) worst-case complexity for core reasoning tasks. A next iteration of the Pinterest Taxonomy will make use of existential restrictions (i.e., SomeValuesFrom class expressions) to support faceted-based classification of interests. Such an extension would likely not go beyond the expressivity of OWL 2 EL, and would provide the benefit of automatically classifying interests by their asserted features, such as color and the materials things are made from, whilst preserving a primary axis of asserted classification that is required by some of the downstream consumers of the ontology.

9 Summary

During the past year’s collaboration with the Protégé team, the Pinterest team have concluded that WebProtégé is by far the most suitable tool for developing the Pinterest Taxonomy. Not only has it proved to be a vast improvement upon the spreadsheet-based taxonomy curation, it has worked better than tools developed in-house. Since adopting WebProtégé, the curation and development cycle of the Pinterest Taxonomy has drastically shortened. With the old spreadsheet-based representation, it could have easily taken six months or longer to get to the same stage as we are today. The process would have been more error prone and far more tedious. WebProtégé facilitated the entire development process and allowed Pinterest to build out the Pinterest Taxonomy in only two months.

The flexibility of the OWL ontology representation allows the Pinterest Taxonomy to be easily expanded to a DAG. Adding further logical axiomatization to encode facets of interests offers the possibility of improving downstream search and recommendation applications. Besides the concrete outcome of an ontology, the use of WebProtégé has had a notable positive effect on strengthening the engagement of internal teams (from advertising and sales) with the Content Management team. Previously, it was a struggle to do this with the spreadsheet-based editing environment.

In October 2018, Pinterest released the newly developed Pinterest Taxonomy for interest-based ads targeting with over 1,500 new interests to target. The advertisers on Pinterest have overall expressed a positive impression of our work on the taxonomy. Additionally, Pinterest have determined that the new representation of their content has measurably increased revenue gains.

Finally, while there are a number of desirable improvements that we could make to the tooling and to the ontology itself, we hope that our experience and insights of using OWL and WebProtégé at Pinterest, to model real data from industry, are useful for the Semantic Web community.

Acknowledgements

We extend a huge thanks to John Milinovich (prev. at Pinterest), who played a pivotal role in establishing the collaboration between Pinterest and the Protégé team. We also thank Lance Riedel (Pinterest) and Brian Johnson (prev. at Pinterest), who steered the project in its earlier stages. The work described in this paper has been fully supported by Pinterest. Core WebProtégé work is supported by NIH NIGMS Grant GM121724.

References

  • [1] Taxonomy, https://en.wikipedia.org/wiki/Taxonomy
  • [2] Grau, B.C., et al.: OWL 2: The next step for OWL. Web Semantics: Science, Services and Agents on the World Wide Web 6(4), 309–322 (2008)
  • [3] Horridge, M., et al.: Simplified OWL ontology editing for the web: Is WebProtégé enough? In: Proc. of the Int. Semantic Web Conference (ISWC). pp. 200–215. Springer (2013)
  • [4] Horridge, M., et al.: Webprotégé: A collaborative Web-based platform for editing biomedical ontologies. Bioinformatics 30(16), 2384–2385 (2014)
  • [5] Knublauch, H., Kontokostas, D.: Shapes Constraint Language (SHACL). W3C Recommendation 11(8) (2017), https://www.w3.org/TR/shacl
  • [6] Milinovich, J.: Introducing the Pinterest Taste Graph and enhanced targeting (2017), https://business.pinterest.com/en/blog/introducing-the-pinterest-taste-graph-and-enhanced-targeting
  • [7] Montiel-Ponsoda, E., et al.: Style guidelines for naming and labeling ontologies in the multilingual web. In: Proc. of the Int. Conf. on Dublin Core and Metadata Applications (2011)
  • [8] Noy, N.F., et al.: Ontology development 101: A guide to creating your first ontology. Stanford Knowledge Systems Laboratory technical report KSL-01-05 (2001)
  • [9] Phillips, A., Davis, M.: BCP 47 - Tags for Identifying Languages. (September 2006), http://www.rfc-editor.org/rfc/bcp/bcp47.txt
  • [10]

    Rector, A., et al.: OWL Pizzas: Practical Experience of Teaching OWL-DL: Common Errors & Common Patterns. In: Proc. of the Int. Conf. on Knowledge Engineering and Knowledge Management. pp. 63–81. Springer (2004)

  • [11] Schober, D., et al.: Towards naming conventions for use in controlled vocabulary and ontology engineering. In: Proc. of the Annual Bio-Ontologies Meeting. p. 87–90 (2007)
  • [12] Svátek, V., Šváb-Zamazal, O.: Entity naming in semantic web ontologies: Design patterns and empirical observations. University of Economics, Prague pp. 1–12 (2010)