Static Embeddings as Efficient Knowledge Bases?

04/14/2021
by   Philipp Dufter, et al.
0

Recent research investigates factual knowledge stored in large pretrained language models (PLMs). Instead of structural knowledge base (KB) queries, masked sentences such as "Paris is the capital of [MASK]" are used as probes. The good performance on this analysis task has been interpreted as PLMs becoming potential repositories of factual knowledge. In experiments across ten linguistically diverse languages, we study knowledge contained in static embeddings. We show that, when restricting the output space to a candidate set, simple nearest neighbor matching using static embeddings performs better than PLMs. E.g., static embeddings perform 1.6 using 0.3 comparative performance is that static embeddings are standardly learned for a large vocabulary. In contrast, BERT exploits its more sophisticated, but expensive ability to compose meaningful representations from a much smaller subword vocabulary.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2021

Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models

Recently, it has been found that monolingual English language models can...
research
10/17/2019

BIG MOOD: Relating Transformers to Explicit Commonsense Knowledge

We introduce a simple yet effective method of integrating contextual emb...
research
03/17/2022

Combining Static and Contextualised Multilingual Embeddings

Static and contextual multilingual embeddings have complementary strengt...
research
10/24/2018

Text Embeddings for Retrieval From a Large Knowledge Base

Text embedding representing natural language documents in a semantic vec...
research
12/04/2020

Modelling General Properties of Nouns by Selectively Averaging Contextualised Embeddings

While the success of pre-trained language models has largely eliminated ...
research
04/27/2023

Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space

The goal of this paper is to learn more about how idiomatic information ...
research
11/10/2021

BagBERT: BERT-based bagging-stacking for multi-topic classification

This paper describes our submission on the COVID-19 literature annotatio...

Please sign up or login with your details

Forgot password? Click here to reset