COMBO: A Complete Benchmark for Open KG Canonicalization

02/08/2023
by   Chengyue Jiang, et al.
0

Open knowledge graph (KG) consists of (subject, relation, object) triples extracted from millions of raw text. The subject and object noun phrases and the relation in open KG have severe redundancy and ambiguity and need to be canonicalized. Existing datasets for open KG canonicalization only provide gold entity-level canonicalization for noun phrases. In this paper, we present COMBO, a Complete Benchmark for Open KG canonicalization. Compared with existing datasets, we additionally provide gold canonicalization for relation phrases, gold ontology-level canonicalization for noun phrases, as well as source sentences from which triples are extracted. We also propose metrics for evaluating each type of canonicalization. On the COMBO dataset, we empirically compare previously proposed canonicalization methods as well as a few simple baseline methods based on pretrained language models. We find that properly encoding the phrases in a triple using pretrained language models results in better relation canonicalization and ontology-level canonicalization of the noun phrase. We release our dataset, baselines, and evaluation scripts at https://github.com/jeffchy/COMBO/tree/main.

READ FULL TEXT

page 1

page 6

page 17

research
12/08/2020

Joint Entity and Relation Canonicalization in Open Knowledge Graphs using Variational Autoencoders

Noun phrases and relation phrases in open knowledge graphs are not canon...
research
07/06/2020

DART: Open-Domain Structured Data Record to Text Generation

We introduce DART, a large dataset for open-domain structured data recor...
research
06/03/2021

Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models

This paper studies how to automatically generate a natural language text...
research
02/01/2019

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

Open Information Extraction (OpenIE) methods extract (noun phrase, relat...
research
06/07/2023

The Two Word Test: A Semantic Benchmark for Large Language Models

Large Language Models (LLMs) have shown remarkable abilities recently, i...
research
10/24/2022

A Unified Framework for Pun Generation with Humor Principles

We propose a unified framework to generate both homophonic and homograph...
research
06/14/2022

The Causal Structure of Semantic Ambiguities

Ambiguity is a natural language phenomenon occurring at different levels...

Please sign up or login with your details

Forgot password? Click here to reset