Consolidating Commonsense Knowledge

06/10/2020 ∙ by Filip Ilievski, et al. ∙ USC Information Sciences Institute Northeastern University 0

Commonsense reasoning is an important aspect of building robust AI systems and is receiving significant attention in the natural language understanding, computer vision, and knowledge graphs communities. At present, a number of valuable commonsense knowledge sources exist, with different foci, strengths, and weaknesses. In this paper, we list representative sources and their properties. Based on this survey, we propose principles and a representation model in order to consolidate them into a Common Sense Knowledge Graph (CSKG). We apply this approach to consolidate seven separate sources into a first integrated CSKG. We present statistics of CSKG, present initial investigations of its utility on four QA datasets, and list learned lessons.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Capturing, representing, and leveraging commonsense knowledge has been a paramount for AI since its early days, cf. McCarthy (1960). In the light of the modern large (commonsense) knowledge graphs and various neural advancements, the DARPA Machine Common Sense program Gunning (2018) represents a new effort to understand commonsense knowledge through question-answering evaluation benchmarks. An example of such question from the SWAG dataset Zellers et al. (2018) describes a woman that takes a sit at the piano:

                   Q: On stage, a woman takes a seat at the piano. She:
                      1. sits on a bench as her sister plays with the doll.
                      2. smiles with someone as the music plays.
                      3. is in the crowd, watching the dancers.
                   -> 4. nervously sets her fingers on the keys.

Realizing that the logical next step is her “nervously setting her fingers on the keys” is out of reach for typical information retrieval strategies, as there is no lexical overlap between the situation and the correct answer. Although language models Devlin et al. (2018); Liu et al. (2019) capture linguistic patterns that allow them to perform well on many questions, they have no mechanism to fill gaps of knowledge in communication.111

Recent work provides evidence that language models, while probably useful and often impressive, are non-robust when applied to semantic tasks, lacking mechanisms to understand plausibility or keep track of evolving states of events and entities 

Marcus (2020). They struggle with higher number of inference steps Richardson and Sabharwal (2019), role-based event prediction Ettinger (2020), as well as numeric, emotional, and spatial inference Bhagavatula et al. (2019). On the other hand, systems like KagNet Lin et al. (2019) and HyKAS Ma et al. (2019) have managed to enhance language models by combining them with background knowledge from ConceptNet Speer et al. (2017). Filling such gaps requires a more complex, situational reasoning, for which the language models need to be enriched with suitable background knowledge, as in Lin et al. (2019).

Intuitively, graphs of (commonsense) knowledge contain such background knowledge that humans possess and apply, but machines cannot access or distill directly in communication. A number of such knowledge sources exist today, which presents a unique opportunity for reasoning in downstream tasks. Taxonomies, like WordNet Miller (1995), organize conceptual knowledge into a hierarchy of classes. An independent ontology, coupled with rich instance-level knowledge, is provided by Wikidata Vrandečić and Krötzsch (2014), a structured version of Wikipedia. FrameNet Baker et al. (1998), on the other hand, defines an orthogonal structure of frames and roles; each of which can be filled with a WordNet/Wikidata class or instance. Sources like ConceptNet Speer et al. (2017) or WebChild Tandon et al. (2017), provide more ‘episodic’ commonsense knowledge, whereas ATOMIC Sap et al. (2019a) captures pre- and post-situations for an event. Finally, image description datasets, like Visual Genome Krishna et al. (2017), have visual commonsense knowledge.

Considering the above example, ConceptNet’s triples state that pianos have keys and are used to perform music, which supports the correct option and discourages answer 2. WordNet states specifically, though in natural language, that pianos are played by pressing keys. According to an image description in Visual Genome, a person could play piano while sitting and having their hands on the keyboard. In natural language, ATOMIC indicates that before a person plays piano, they need to sit at it, be on stage, and reach for the keys. ATOMIC also lists strong feelings associated with playing piano. FrameNet’s frame of a performance contains two separate roles for the performer and the audience, meaning that these two are distinct entities, which can be seen as evidence against answer 3. While these sources clearly provide complementary knowledge that can help commonsense reasoning, their representation formats, principles and foci are different, making integration difficult.

In this paper, we propose an approach for integrating these (and more sources) into a single Common Sense Knowledge Graph (CSKG). We start by surveying existing sources of commonsense knowledge to understand their particularities (section 2). We summarize key challenges and related efforts on consolidating commonsense knowledge in section 3. Based on the survey and the listed challenges, we devise five principles and a representation model for a consolidated CSKG (section 4). In section 5 we apply our approach to build the first version of CSKG, by combining seven complementary, yet disjoint, sources. Here, we also compare the evidence provided by CSKG compared to ConceptNet on four commonsense QA datasets. In section 6 we reflect on CSKG and discuss its role in future research. We conclude the paper in section 7.

describes creation size mappings examples
Concept Net everyday objects, actions, states, relations automatic extraction 36 relations, 8M nodes, 21M edges WordNet, DBpedia, OpenCyc, Wiktionary /c/en/piano /c/en/piano/n /c/en/piano/n/wn /r/relatedTo
Web Child ditto crowd-sourcing 4 relation groups, 2M nodes, 18M edges WordNet hasTaste fasterThan
ATOMIC event pre/post-conditions crowd-sourcing 9 relations, 300k nodes, 877k edges ConceptNet,      Cyc wanted-to impressed
Wikidata instances, concepts, relations crowd-sourcing 1.2k relations, 75M objects, 900M edges various wd:Q1234 wdt:P31
CEO event pre/ post/during-conditions manual 121 properties, 223 events FrameNet, SUMO ceo:Damaging hasPostSituation
WordNet words, concepts, relations manual 10 relations, 155k words, 176k synsets dog.n.01 hypernymy meronymy
Roget words, relations manual 2 relations, 72k words, 1.4M edges truncate     antonym
VerbNet verbs,     relations manual 273 top classes 23 roles,       5.3k senses FrameNet, WordNet perform-v performance-26.7-1
FrameNet frames, roles, relations manual 1.9k edges, 1.2k frames, 12k roles, 13k lexical units Activity Change_of_leadership New_leader
Visual Genome image objects, relations, attributes crowd-sourcing 42k relations, 3.8M nodes, 2.3M edges, 2.8M attributes WordNet fire hydrant white dog
ImageNet image objects crowd-sourcing 14M images, 22k synsets WordNet dog.n.01
Flickr 30k image     objects crowd-sourcing 30k images, 750 objects her backyard     red bags
Table 1: Survey of existing sources of commonsense knowledge.

2 Sources of Common Sense Knowledge

We survey existing commonsense knowledge sources: ConceptNet Speer et al. (2017), WebChild Tandon et al. (2017), ATOMIC Sap et al. (2019a), Wikidata Vrandečić and Krötzsch (2014), CEO Segers et al. (2018), WordNet Miller (1995), Roget Kipfer (2005), VerbNet Schuler (2005), FrameNet Baker et al. (1998), Visual Genome Krishna et al. (2017), ImageNet Deng et al. (2009), and Flickr30k Plummer et al. (2016).222Labels that refer to the same image object in Flickr30k were clustered by van Miltenburg (2016). Table 1 summarizes their content, creation method, size, external mappings, and example resources.

Primarily, we observe that the commonsense knowledge is spread over a number of sources with different focus: commonsense knowledge graphs (e.g., ConceptNet), general-domain knowledge graphs (e.g., Wikidata), lexical resources (e.g., WordNet, FrameNet), taxonomies (e.g., Wikidata, WordNet), and visual datasets (e.g., Visual Genome). Therefore, these sources together cover a rich spectrum of knowledge, ranging from everyday knowledge, through event-centric knowledge and taxonomies, to visual knowledge. While the taxonomies have been created manually by experts, most of the commonsense and visual sources have been created by crowdsourcing. Similarly, commonsense and general knowledge graphs tend to be relatively large, with millions of nodes and edges; whereas the taxonomies and the lexical sources are notably smaller. Despite the diverse nature of these sources , we observe that many contain mappings to WordNet, as well as a number of other sources. These mappings might not be complete, e.g., only a small portion of ATOMIC can be mapped to ConceptNet and less than 1% to Cyc. Nevertheless, these high-quality mappings provide an opening for consolidation of commonsense knowledge, a goal we pursue in this paper.

3 Problem Statement

Combining such commonsense knowledge sources in a single graph faces three key challenges.

Firstly, the sources follow different knowledge modeling approaches

. One such difference concerns the relation set: there are very few relations in ConceptNet and WordNet, but (tens of) thousands of them in Wikidata and Visual Genome. Consolidating these sources then inherently requires a decision on how to model the relations. The granularity of knowledge is another factor of variance. While regular RDF triples fit some sources (e.g., ConceptNet), representing entire frames (e.g., in FrameNet), event conditions (e.g., in ATOMIC), or compositional image data (e.g., Visual Genome) might benefit from a more open format. An ideal representation would support the entire spectrum of granularity.

Secondly, as a number of these sources have been created to support natural language applications, they often contain imprecise descriptions. Natural language phrases are often the main node types in the provided knowledge sources, which provides the benefit of easier access for natural language algorithms, but it introduces ambiguity which might be undesired from a formal semantics perspective. An ideal representation would consolidate various phrasings that share a concept or a referent, while still allowing easy and efficient access to these concepts based on their natural language labels or aliases.

Thirdly, although these sources contain links to existing ones, we observe sparse overlap. As these external links are typically to WordNet, and vary in terms of their version (3.0 or 3.1) or target (lemma or synset), the sources are still disjoint and establishing (identity) connections is difficult. Bridging these gaps, through optimally leveraging existing links, or extending them with additional ones automatically, is a modeling and integration challenge.

Previous efforts that combine commonsense resources exist. A unidirectional manual mapping from VerbNet classes to WordNet and FrameNet is provided by the Unified Verb Index Trumbo (2006). The Predicate Matrix De Lacalle et al. (2016) has a full automatic mapping between lexical resources, including FrameNet, WordNet, and VerbNet. PreMOn Corcoglioniti et al. (2016) formalizes these in RDF. Miller and Gurevych (2014); McCrae (2018) produce partial mappings between WordNet and Wikipedia/DBpedia. Recent systems integrate parts of these sources in an ad-hoc manner to reason on a downstream task, e.g., Zareian et al. (2020)

combine edges from Visual Genome, WordNet, and ConceptNet in a neural network that produces a scene graph from an image.

4 Approach

To address the aforementioned challenges, we devise principles and a respective representation format that are driven by: simplicity, modularity, and utility. It should be simple to integrate the graph represented in this format and its arbitrary subsets in reasoning systems, compute (graph and word) embeddings, and run off-the-shelf link prediction tools.

4.1 Principles

We propose that the construction of a unified CSKG should follow five principles:

P1. Embrace heterogeneity of nodes While building CSKG, one should preserve the natural node diversity inherent to the variety of sources considered. This entails blurring the distinction between objects (such as those in Visual Genome or Wikidata), classes (such as those in WordNet or ConceptNet), words (in Roget), actions (in ATOMIC or ConceptNet), frames (in FrameNet), and states (as in ATOMIC). It also allows formal nodes, describing unique objects, to co-exist with fuzzy nodes describing ambiguous lexical expressions.

P2. Reuse edge types across resources To support reasoning algorithms, edge types should be kept to minimum and reused across resources wherever possible. For instance, the ConceptNet edge type /r/RelatedTo could be reused to relate a Visual Genome object (e.g., ‘piano’) to its attributes (e.g., ‘black’ or ‘room’). Note that we do not propose to impoverish the semantics of existing relations.

P3. Leverage external links The separate graphs are mostly disjoint according to their formal knowledge. However, high-quality links may exist or may be easily inferred, in order to connect these graphs and enable path finding. For instance, while ConceptNet and Visual Genome do not have direct connections, they both make reference to WordNet synsets. Investing an effort in aligning these WordNet synsets would produce a number of very valuable connections between the two knowledge sources.

P4. Generate high-quality probabilistic links Experimenting with inclusion of additional probabilistic links would be beneficial, as it would combat sparsity and help path finding algorithms reason over CSKG. These could be inferred with off-the-shelf link prediction algorithms, or with specialized algorithms (see section 5 for an example).

P5. Enable access to labels The text typically associated with KG entities, like labels or aliases, provides application-friendly and human-readable access to the CSKG. It can also help us unify descriptions of the same/similar concept across sources. We need to ensure that the graph format supports easy and efficient natural language queries over this text.

4.2 Representation

We model CSKG as a property graph, consisting of nodes and edges information in a tabular format. We opted for this representation rather than the traditional RDF/OWL2 because it allows us to fulfill our goals (of simplicity and utility) and follow our principles more directly, without compromising on the format. For instance, natural language access (principle P5) to RDF/OWL2 nodes requires graph traversal over its rdfs:label relations. Including both reliable and probabilistic nodes would require a mechanism to easily indicate edge weights, which in RDF/OWL2 entails inclusion of blank nodes, and a number of additional edges. Moreover, the simplicity of our tabular format allows us to use standard off-the-shelf functionalities and mature tooling, like the pandas333 and graph-tool444 libraries in Python. We can also compute embeddings with Lerer et al. (2019) or plug CSKG in a reasoner, e.g., Wang et al. (2018), with minimal adaptation, as these expect tabular inputs.

Node representation Each node is described by six columns: id (reused from the source data when possible), label (its primary label), aliases (additional labels), pos (part-of-speech tag, if applicable), datasource (original source, one of: ‘cn’, ‘vg’, ‘wn‘, ‘rg’, ‘wd‘, ‘fn’, ‘at’, or ‘mowgli’ for custom nodes), and other (a dictionary with provenance information, e.g., ‘image_id’). Due to its generality, this node representation is suitable for any kind of node: object, state, class, or action, thus satisfying our principle P1. To accommodate P5, we ensure easy access to node labels and aliases by assigning them dedicated columns.

Edge representation Edges are described by six columns: subject (the subject ID of the edge), predicate (edge label), object (its object ID), datasource (its original source: ‘cn’, ‘vg’, ‘wn’, ‘rg’, ‘wd’, ‘fn’, ‘at’, or ‘mowgli’ for custom relations), weight (as given by the source), and other (provenance information, e.g., source sentence). Following the approach of ConceptNet and our principle P2, we minimize the set of edges. We introduce a dedicated relation, mw:SameAs, to indicate identity. When identity between two nodes is readily available in an external source, e.g., as a mapping, we create a link with a weight of 1.0 (principle P3). Per P4, we also predict implicit edges with weights between 0 and 1.

Figure 1: Snippet from the Common Sense Knowledge Graph (CSKG). The yellow nodes are created by merging nodes from individual resources, e.g., vg:key+/c/en/key combines one lexical expression from Visual Genome and one from ConceptNet.

5 The Common Sense Knowledge Graph

In its current form (Figure 1), CSKG integrates seven sources: a commonsense knowledge graph ConceptNet, a visual commonsense source Visual Genome, a procedural source ATOMIC, a general-domain source Wikidata, and three lexical sources, WordNet, Roget, and FrameNet. Here, we present our design decisions per source, their integration, and statistics on the resulting CSKG graph. We also present initial investigations on utilizing CSKG for commonsense question answering.

5.1 Consolidation

5.1.1 Individual sources

ConceptNet We keep its original data and representation, and we increase its consistency and connectivity in two ways. Firstly, we compute closure over the seven symmetric relations defined in ConceptNet as the symmetry was not consistently reflected in the data. Secondly, we observe that different forms of a concept are not formally linked. We introduce two mutually inverse relations, mw:POSForm and mw:IsPOSFormOf, to link a lemma to its part-of-speech version. Also, we define a mw:PartOfSpeech class, and we add edges of type mw:OMWordnetOffset between a lemma node and its WordNet v3.1 offset form.

Visual Genome does not have its data formatted in a Semantic Web compliant format. We follow the ConceptNet approach and represent the label of each object, relation, and attribute as a node in the graph (e.g., vg:dog). We introduce two relations vg:Subject and vg:Object to indicate the subject and the object of a relation node. We also establish direct, symmetric links between a subject and an object node by reusing the relation /r/RelatedTo from ConceptNet. This relation is also leveraged to represent relations between nodes and their properties. To contextualize an object/relation in terms of its image, we create nodes for image objects (prefix vg:I*) and utilize the relation vg:InImage. For each object, relation, or attribute, we use the relation mw:PWordnetSynset to connect it to its WordNet v3.0 synset provided in the data dump.

WordNet We include hypernymy from WordNet v3.0, via the rdfs:subClassOf relation.

Roget We include all synonyms and antonyms between words in Roget, by reusing the Concept relations /r/Synonym and /r/Antonym.

ATOMIC We include the entire knowledge graph, preserving the original nodes and relations. To enhance lexical matching over the labels in CSKG, we normalize the labels of the events and their attributes: converting them to lowercase, removing references to ‘Person*’, and excluding ‘none’ values.

Wikidata We include the Wikidata taxonomy through the rdfs:subClassOf relation.

FrameNet Four node types from the FrameNet ontology are imported into CSKG : Frames, Frame Elements (FEs), Lexical units (LUs), and Semantic Types (STs). We reuse 5 categories of FrameNet edges: Frame-Frame (13 edge types), Frame-FE (1 edge type), Frame-LU (1 edge type), FE-ST (1 edge type), and ST-ST (3 edge types).

5.1.2 Mappings

WordNet-WordNet The WordNet v3.1 offsets in ConceptNet and the WordNet v3.0 synsets from Visual Genome are aligned by leveraging ILI: the WordNet InterLingual Index.555 The generated 117,097 mappings are expressed through our identity relation, mw:SameAs.

WordNet-Wikidata We compute probabilistic links between WordNet synsets and Wikidata taxonomy nodes. Our approach consists of three components: a Candidate Retrieving Module (CRM), a Similarity Calculating Module (SCM), and a Mapping Module (MM). CRM retrieves candidate nodes from a customized ElasticSearch index of Wikidata. Concretely, it matches a synset word in any text field (including labels, aliases, and descriptions), and it ranks the candidates with a version of the default TF-IDF-based algorithm which adapts the score proportionally to the number of incoming links.666Documentation: The top-ranked candidates are retained. Then, SCM computes sentence embeddings of the descriptions of the WordNet synset and each of the Wikidata candidates by using a pre-trained XLNet model Yang et al. (2019)

. The similarity between a synset and a Wikidata node is computed as a cosine similarity between their corresponding embeddings. MM creates a

mw:SameAs edge between a WordNet synset and the Wikidata candidate with highest similarity. This similarity is represented as weight of the mapping edge. The accuracy of each mapping has been validated by one student. In total, 17 students took part in this validation. Out of all edges produced by the algorithm (112,012), the manual validation marked 57,145 as correct. We keep these in CSKG and discard the rest.

FrameNet-ConceptNet We connect the FrameNet nodes to ConceptNet in two ways. Its lexical units are mapped to corresponding ConceptNet nodes through the Predicate Matrix (cf. section 3), producing mw:SameAs edges.777During this step, we manually fixed approximately nodes which contained spelling and tokenization errors in the Predicate Matrix. Then, we use hand-annotated sentences from the FrameNet corpus, each annotated with its target frame, a set of FEs, and the words associated with each FE. We consider the set of words to be an instance of that specific FE. We ground these sets of words to ConceptNet with the rule-based method of Lin et al. (2019), thus adding 45,659 mw:HasInstance edges.

Roget-ConceptNet We establish 60,307 mw:SameAs relations between word-representing nodes in Roget and in ConceptNet by a simple lexical match of their labels.

Visual Genome-ConceptNet We establish 32,283 mw:SameAs relations between lexical nodes in Visual Genome and in ConceptNet by exact matching over their labels.

ATOMIC-ConceptNet We establish 14,272 mw:SameAs relations between lexical nodes in ATOMIC and in ConceptNet by an exact match of their labels.

5.1.3 Refinement

We consolidate the seven sources and six mappings as follows. Firstly, we deduplicate each edge table by aggregating over its first three columns (subject, predicate, object) and combining the values for the remaining columns. Similarly, we deduplicate each node table by aggregating over its id column and combining the values for the other columns. Secondly, we concatenate all 13 edge tables into one, same for the 6 node tables, to produce a raw version of CSKG. Thirdly, we merge identical nodes, by combining nodes that are connected with a mw:SameAs link. This operation is reflected in both the nodes and the edges table. The deduplicated version of the result is our consolidated CSKG.

# nodes 1,787,276 316,660 87,942 71,804 2,388,479 36,582 288,943 4,738,502
# edges 7,211,322 4,833,879 89,089 1,403,461 2,926,639 79,060 704,315 17,210,065
max degree 587,358 127,580 404 1,549 964,400 2,438 9,856 964,400
mean degree 8.07 30.53 2.03 39.09 2.45 4.32 4.87 7.26
std degree 0.34 0.83 0.02 0.34 0.58 0.16 0.06 0.32
Table 2: CSKG statistics. Abbreviations: CN=ConceptNet, VG=Visual Genome, WN=WordNet, RG=Roget, WD=Wikidata, FN=FrameNet, AT=ATOMIC.
Figure 2: Degree distribution (log-log plots).

5.2 Statistics

Basic statistics are shown in Table 2. In total, our mappings produce 284,121 mw:SameAs links. A small portion (less than ) of the nodes and edges were duplicates. After refinement, i.e., removal of the duplicates and merging of the identical nodes, CSKG consists of 4.7 million nodes and 17.2 million edges. In terms of edges, its largest subgraph is ConceptNet (7.2 million), whereas Visual Genome comes second with 4.8 million edges. Wikidata contributes with the largest number of nodes, closely followed by ConceptNet. The three most common relations in CSKG are: /r/RelatedTo (4.6 million), vg:InImage (3 million), and rdfs:subClassOf (3 million).

Degree distribution

The mean degree of CSKG, after merging identical nodes, grows from 7.00 to 7.26. Its standard deviation is 0.32, similar to ConceptNet. The best connected subgraphs are Roget and Visual Genome, with mean degrees of 39.09 and 30.53, respectively. The least connected graphs are the hierarchies of WordNet and Wikidata. The degrees of WordNet and ATOMIC are fairly uniform across their nodes, whereas Visual Genome and Wikidata have higher variation. The large difference between the variations of WordNet and Wikidata can be explained by the fact that WordNet nodes typically have a single parent, whereas the Wikidata ontology is more flat and has many arcs from its leaves to high-level nodes. The maximum degree in CSKG is nearly a million, which is due to Wikidata.

Figure 3: PageRank distribution.

Figure 2 presents the input, output, and total degree distributions. The highest out-degree is 10 times lower than the highest in-degree. Although most nodes have a low degree and the frequency decreases for the higher degrees, nearly nodes still have a total degree of 10, and many others have a much higher one. This indicates that CSKG is a well-connected graph.

Centrality We compute PageRank and HITS metrics of CSKG. Its PageRank distribution (Figure 3) indicates that while most nodes have a low PageRank value, have PageRank of over 0.001, The top PageRank nodes are: wn:polypeptide.n.01+/c/en/polypeptide/n/wn/substance+wd:Q8054 (polypeptide as a noun, merged from three sources), wd:Q7187 (gene), and wd:Q20747295 (protein-coding gene). According to HITS, the hubs in CSKG are: wd:Q20747295 (protein-coding gene), wd:Q7187 (gene), and wd:Q427087 (non-coding RNA). The top authorities are also Wikidata nodes. The dominance of Wikidata in the centrality metrics is due to its taxonomy, where subclass relations are often directed at high-level nodes. It is unclear, however, whether such bio-informatics nodes hold value for commonsense reasoning. Future work should investigate which subsets of Wikidata contain commonsense knowledge.

5.3 CSKG on Downstream Tasks

train dev
CSQA 9,741 78,729 106,619 153,442 1,221 9,758 13,132 19,036
SIQA 33,410 126,596 189,859 330,200 1,954 7,850 11,654 19,953
PIQA 16,113 18,549 28,996 70,131 1,838 2,170 3,401 8,071
aNLI 169,654 257,163 389,640 771,318 1,532 5,603 8,477 16,456
Table 3: Number of triples retrieved with ConceptNet (CN), CSKG’s ConceptNet subset (CSKG-CN), and CSKG on different datasets. #Q=number of questions.

In a preliminary investigation, we measure the relevance of CSKG for commonsense question answering tasks, by comparing the number of retrieved triples that connect keywords in the question and in the answers. For this purpose, we adapt the lexical grounding in HyKAS Ma et al. (2019) to retrieve triples from CSKG instead of its default knowledge source, ConceptNet. We expect that CSKG can provide much more evidence than ConceptNet, both in terms of number of triples and their diversity. We experiment with four commonsense datasets: CommonSense QA (CSQA) Talmor et al. (2018), Social IQA (SIQA) Sap et al. (2019b), Physical IQA (PIQA) Bisk et al. (2019), and abductive NLI (aNLI) Bhagavatula et al. (2019). As shown in Table 3, CSKG significantly increases the number of evidence triples that connect terms in questions with terms in answers, in comparison to ConceptNet. We note that the increase on all datasets is roughly three-fold, the expected exception being CSQA, which was inferred from ConceptNet.

We inspect a sample of questions to gain insight into whether the additional triples are relevant and could benefit reasoning. For instance, let us consider the CSQA question “Bob the lizard lives in a warm place with lots of water. Where does he probably live?”, whose correct answer is “tropical rainforest”. In addition to the ConceptNet triple /c/en/lizard /c/en/AtLocation /c/en/tropical_rainforest, CSKG provides two additional triples, stating that tropical is an instance of place and that water is related to tropical.888fn:fe:place mw:HasInstance rg:tropical+/c/en/tropical+vg:tropical 999rg:water+/c/en/water+at:water+vg:water /r/RelatedTo rg:tropical+/c/en/tropical+vg:tropical The first additional edge stems from our mappings from FrameNet to ConceptNet, whereas the second comes from Visual Genome. Interestingly, the above example comes from CSQA, which has been inferred from ConceptNet, showing that most commonsense knowledge is still largely missing in existing resources. We note that, while CSKG increases the coverage with respect to available commonsense knowledge, it is also incomplete: in the above example, useful information such as warm temperatures being typical for tropical rainforests is still missing.

6 Discussion

The graph metrics in the previous section indicate that CSKG is well-connected, thus showing the impact of our mappings across sources and the merge of identical nodes. Furthermore, the novel evidence brought by CSKG on downstream QA tasks (section 5.3) is a signal that can be exploited by reasoning systems to enhance their performance and robustness. What are the next steps for CSKG? We discuss three ongoing pursuits.

Downstream tasks Injecting knowledge from ConceptNet has improved the performance of existing question-answering systems like KagNet Lin et al. (2019), GapQA Khot et al. (2019), and HyKAS Ma et al. (2019).101010For an overview of commonsense knowledge sources, reasoners, and benchmarks, see Storks et al. (2019). At the same time, the error analyses and discussions of these systems reveals that missing knowledge is directly responsible for a portion of their errors. We expect that the richer and more diverse knowledge captured by CSKG, as quantified in section 5, would benefit commonsense QA systems. To validate this expectation, we developed a modular evaluation framework, and we computed various graph and word embeddings of CSKG and it subsets. At present, we are running experiments with adaptations of HyKAS and KagNet on six such datasets.

New resources and mappings We intend to continue integrating the resources listed in section 2: WebChild, VerbNet, and CEO. We envision their integration following our approach to be fairly straightforward. Moreover, as machine common sense knowledge is an active research area (section 1), we expect additional knowledge sources to be released and integrated in CSKG in the near future. Similarly, we intend to create further links between the resources to maximize the connectivity of the ingredients within CSKG.

Semantic enrichment The semantics of CSKG could be improved by refining its relations. For instance, its most common ConceptNet relation, /r/RelatedTo abstracts over various specific predicates, and its full set of relations is unknown and potentially large. In Figure 1, it expresses containment (piano related to piano key), and inheritance (piano key related to key), while elsewhere (farmer related to man), it obfuscates occupation. Clustering its knowledge into more specific predicates would improve the semantics of CSKG.

7 Conclusions and Future Work

The traditional goal of capturing, representing, and leveraging commonsense knowledge has recently gained traction, thanks to initiatives like DARPA’s Machine Common Sense Gunning (2018). In this paper, we reviewed representative commonsense knowledge sources. While they contain complementary knowledge that would be beneficial as a whole for downstream tasks, such usage is prevented by their different approaches, foci, strenghts, and weaknesses. Optimizing for simplicity, modularity, and utility, we proposed a property graph that describes many nodes with a few edge types, maximizes the high-quality links across subgraphs, and enables natural language access. We applied this approach to consolidate a commonsense knowledge graph (CSKG) from seven very diverse sources: a text-based commonsense knowledge graph ConceptNet, a general-purpose taxonomy Wikidata, an image description dataset Visual Genome, a procedural knowledge source ATOMIC, and three lexical sources: WordNet, Roget, and FrameNet. It describes 4.7 million nodes with 17.2 million statements. Our analysis showed that CSKG is a well-connected graph and more than ‘a simple sum of its parts’. On four commonsense QA datasets, it consistently increased the recall of relevant triples by 2-4 times compared to ConceptNet. At present, we are investigating whether this additional signal brought by CSKG helps reasoning on these tasks, and we are integrating further resources reviewed in this paper. CSKG will be released under CC BY-SA 4.0, the most permissive license allowed by its components.


  • C. F. Baker, C. J. Fillmore, and J. B. Lowe (1998) The berkeley framenet project. In Proceedings of the 17th international conference on Computational linguistics-Volume 1, pp. 86–90. Cited by: §1, §2.
  • C. Bhagavatula, R. L. Bras, C. Malaviya, K. Sakaguchi, A. Holtzman, H. Rashkin, D. Downey, S. W. Yih, and Y. Choi (2019) Abductive commonsense reasoning. arXiv preprint arXiv:1908.05739. Cited by: §5.3, footnote 1.
  • Y. Bisk, R. Zellers, R. L. Bras, J. Gao, and Y. Choi (2019) PIQA: reasoning about physical commonsense in natural language. arXiv preprint arXiv:1911.11641. Cited by: §5.3.
  • F. Corcoglioniti, M. Rospocher, A. P. Aprosio, and S. Tonelli (2016) PreMOn: a lemon extension for exposing predicate models as linked data. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 877–884. Cited by: §3.
  • M. L. De Lacalle, E. Laparra, I. Aldabe, and G. Rigau (2016) Predicate matrix: automatically extending the semantic interoperability between predicate resources. Language Resources and Evaluation 50 (2), pp. 263–289. Cited by: §3.
  • J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In

    2009 IEEE conference on computer vision and pattern recognition

    pp. 248–255. Cited by: §2.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §1.
  • A. Ettinger (2020) What bert is not: lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics 8, pp. 34–48. Cited by: footnote 1.
  • D. Gunning (2018) Machine common sense concept paper. arXiv preprint arXiv:1810.07528. Cited by: §1, §7.
  • T. Khot, A. Sabharwal, and P. Clark (2019) What’s missing: a knowledge gap guided approach for multi-hop question answering. arXiv preprint arXiv:1909.09253. Cited by: §6.
  • B. Kipfer (2005) Roget’s 21st century thesaurus in dictionary form (éd. 3). new york: the philip lief group. Inc. Cited by: §2.
  • R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L. Li, D. A. Shamma, et al. (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision 123 (1), pp. 32–73. Cited by: §1, §2.
  • A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich (2019) Pytorch-biggraph: a large-scale graph embedding system. arXiv preprint arXiv:1903.12287. Cited by: §4.2.
  • B. Y. Lin, X. Chen, J. Chen, and X. Ren (2019) Kagnet: knowledge-aware graph networks for commonsense reasoning. arXiv preprint arXiv:1909.02151. Cited by: §1, §5.1.2, §6, footnote 1.
  • Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. Cited by: §1.
  • K. Ma, J. Francis, Q. Lu, E. Nyberg, and A. Oltramari (2019) Towards generalizable neuro-symbolic systems for commonsense question answering. arXiv preprint arXiv:1910.14087. Cited by: §5.3, §6, footnote 1.
  • G. Marcus (2020)

    The next decade in ai: four steps towards robust artificial intelligence

    arXiv preprint arXiv:2002.06177. Cited by: footnote 1.
  • J. McCarthy (1960) Programs with common sense. RLE and MIT computation center. Cited by: §1.
  • J. P. McCrae (2018) Mapping wordnet instances to wikipedia. In Proceedings of the 9th Global WordNet Conference (GWC 2018), pp. 62–69. Cited by: §3.
  • G. A. Miller (1995) WordNet: a lexical database for english. Communications of the ACM 38 (11), pp. 39–41. Cited by: §1, §2.
  • T. Miller and I. Gurevych (2014) WordNet―wikipedia―wiktionary: construction of a three-way alignment.. In LREC, pp. 2094–2100. Cited by: §3.
  • B. A. Plummer, L. Wang, C. M. Cervantes, J. C. Caicedo, J. Hockenmaier, and S. Lazebnik (2016) Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models. International Journal of Computer Vision, pp. 1–20. Cited by: §2.
  • K. Richardson and A. Sabharwal (2019) What does my qa model know? devising controlled probes using expert knowledge. arXiv preprint arXiv:1912.13337. Cited by: footnote 1.
  • M. Sap, R. Le Bras, E. Allaway, C. Bhagavatula, N. Lourie, H. Rashkin, B. Roof, N. A. Smith, and Y. Choi (2019a) Atomic: an atlas of machine commonsense for if-then reasoning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 3027–3035. Cited by: §1, §2.
  • M. Sap, H. Rashkin, D. Chen, R. LeBras, and Y. Choi (2019b) Socialiqa: commonsense reasoning about social interactions. arXiv preprint arXiv:1904.09728. Cited by: §5.3.
  • K. K. Schuler (2005)

    VerbNet: a broad-coverage, comprehensive verb lexicon

    Cited by: §2.
  • R. Segers, T. Caselli, and P. Vossen (2018) The circumstantial event ontology (ceo) and ecb+/ceo: an ontology and corpus for implicit causal relations between events. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Cited by: §2.
  • R. Speer, J. Chin, and C. Havasi (2017) Conceptnet 5.5: an open multilingual graph of general knowledge. In Thirty-First AAAI Conference on Artificial Intelligence, Cited by: §1, §2, footnote 1.
  • S. Storks, Q. Gao, and J. Y. Chai (2019) Commonsense reasoning for natural language understanding: a survey of benchmarks, resources, and approaches. arXiv preprint arXiv:1904.01172. Cited by: footnote 10.
  • A. Talmor, J. Herzig, N. Lourie, and J. Berant (2018) Commonsenseqa: a question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937. Cited by: §5.3.
  • N. Tandon, G. De Melo, and G. Weikum (2017) Webchild 2.0: fine-grained commonsense knowledge distillation. In Proceedings of ACL 2017, System Demonstrations, pp. 115–120. Cited by: §1, §2.
  • D. Trumbo (2006) Increasing the usability of research lexica. Ph.D. Thesis, University of Colorado at Boulder. Cited by: §3.
  • E. van Miltenburg (2016) Stereotyping and bias in the flickr30k dataset. In Proceedings of Multimodal Corpora: Computer vision and language processing (MMC 2016), J. Edlund, D. Heylen, and P. Paggio (Eds.), pp. 1–4. External Links: Link Cited by: footnote 2.
  • D. Vrandečić and M. Krötzsch (2014) Wikidata: a free collaborative knowledgebase. Communications of the ACM 57 (10), pp. 78–85. Cited by: §1, §2.
  • L. Wang, M. Sun, W. Zhao, K. Shen, and J. Liu (2018) Yuanfudao at semeval-2018 task 11: three-way attention and relational knowledge for commonsense machine comprehension. arXiv preprint arXiv:1803.00191. Cited by: §4.2.
  • Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le (2019) Xlnet: generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pp. 5754–5764. Cited by: §5.1.2.
  • A. Zareian, S. Karaman, and S. Chang (2020) Bridging knowledge graphs to generate scene graphs. arXiv preprint arXiv:2001.02314. Cited by: §3.
  • R. Zellers, Y. Bisk, R. Schwartz, and Y. Choi (2018) Swag: a large-scale adversarial dataset for grounded commonsense inference. arXiv preprint arXiv:1808.05326. Cited by: §1.