Multi-View Clustering for Open Knowledge Base Canonicalization

06/22/2022
by   Wei Shen, et al.
0

Open information extraction (OIE) methods extract plenty of OIE triples <noun phrase, relation phrase, noun phrase> from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not canonicalized, which leads to scattered and redundant facts. It is found that two views of knowledge (i.e., a fact view based on the fact triple and a context view based on the fact triple's source context) provide complementary information that is vital to the task of OKB canonicalization, which clusters synonymous noun phrases and relation phrases into the same group and assigns them unique identifiers. However, these two views of knowledge have so far been leveraged in isolation by existing works. In this paper, we propose CMVC, a novel unsupervised framework that leverages these two views of knowledge jointly for canonicalizing OKBs without the need of manually annotated labels. To achieve this goal, we propose a multi-view CH K-Means clustering algorithm to mutually reinforce the clustering of view-specific embeddings learned from each view by considering their different clustering qualities. In order to further enhance the canonicalization performance, we propose a training data optimization strategy in terms of data quantity and data quality respectively in each particular view to refine the learned view-specific embeddings in an iterative manner. Additionally, we propose a Log-Jump algorithm to predict the optimal number of clusters in a data-driven way without requiring any labels. We demonstrate the superiority of our framework through extensive experiments on multiple real-world OKB data sets against state-of-the-art methods.

READ FULL TEXT

page 4

page 5

page 6

page 7

page 8

page 9

page 11

page 12

research
02/01/2019

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

Open Information Extraction (OpenIE) methods extract (noun phrase, relat...
research
12/02/2022

Joint Open Knowledge Base Canonicalization and Linking

Open Information Extraction (OIE) methods extract a large number of OIE ...
research
08/19/2018

Deep Multi-View Clustering via Multiple Embedding

Exploring the information among multiple views usually leads to more pro...
research
02/15/2017

Automated Phrase Mining from Massive Text Corpora

As one of the fundamental tasks in text analysis, phrase mining aims at ...
research
12/08/2020

Joint Entity and Relation Canonicalization in Open Knowledge Graphs using Variational Autoencoders

Noun phrases and relation phrases in open knowledge graphs are not canon...
research
05/28/2021

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging

Identifying and understanding quality phrases from context is a fundamen...
research
06/17/2020

Canonicalizing Open Knowledge Bases with Multi-Layered Meta-Graph Neural Network

Noun phrases and relational phrases in Open Knowledge Bases are often no...

Please sign up or login with your details

Forgot password? Click here to reset