Interpreting Language Models Through Knowledge Graph Extraction

by   Vinitra Swamy, et al.

Transformer-based language models trained on large text corpora have enjoyed immense popularity in the natural language processing community and are commonly used as a starting point for downstream tasks. While these models are undeniably useful, it is a challenge to quantify their performance beyond traditional accuracy metrics. In this paper, we compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process. Structured relationships from training corpora may be uncovered through querying a masked language model with probing tasks. We present a methodology to unveil a knowledge acquisition timeline by generating knowledge graph extracts from cloze "fill-in-the-blank" statements at various stages of RoBERTa's early training. We extend this analysis to a comparison of pretrained variations of BERT models (DistilBERT, BERT-base, RoBERTa). This work proposes a quantitative framework to compare language models through knowledge graph extraction (GED, Graph2Vec) and showcases a part-of-speech analysis (POSOR) to identify the linguistic strengths of each model variant. Using these metrics, machine learning practitioners can compare models, diagnose their models' behavioral strengths and weaknesses, and identify new targeted datasets to improve model performance.


Is neural language acquisition similar to natural? A chronological probing study

The probing methodology allows one to obtain a partial representation of...

Leveraging Knowledge and Reinforcement Learning for Enhanced Reliability of Language Models

The Natural Language Processing(NLP) community has been using crowd sour...

Understanding Finetuning for Factual Knowledge Extraction from Language Models

Language models (LMs) pretrained on large corpora of text from the web h...

A Case Study for Compliance as Code with Graphs and Language Models: Public release of the Regulatory Knowledge Graph

The paper presents a study on using language models to automate the cons...

An Empirical Study of Pre-trained Language Models in Simple Knowledge Graph Question Answering

Large-scale pre-trained language models (PLMs) such as BERT have recentl...

Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa

Large Language Models have many methods for solving the same problem. Th...

Extensive Evaluation of Transformer-based Architectures for Adverse Drug Events Extraction

Adverse Event (ADE) extraction is one of the core tasks in digital pharm...

Please sign up or login with your details

Forgot password? Click here to reset