Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches 1935-2014

06/03/2018
by   Alexander Herzog, et al.
0

Topic models are among the most widely used methods in natural language processing, allowing researchers to estimate the underlying themes in a collection of documents. Most topic models use unsupervised methods and hence require the additional step of attaching meaningful labels to estimated topics. This process of manual labeling is not scalable and often problematic because it depends on the domain expertise of the researcher and may be affected by cardinality in human decision making. As a consequence, insights drawn from a topic model are difficult to replicate. We present a semi-automatic transfer topic labeling method that seeks to remedy some of these problems. We take advantage of the fact that domain-specific codebooks exist in many areas of research that can be exploited for automated topic labeling. We demonstrate our approach with a dynamic topic model analysis of the complete corpus of UK House of Commons speeches from 1935 to 2014, using the coding instructions of the Comparative Agendas Project to label topics. We show that our method works well for a majority of the topics we estimate, but we also find institution-specific topics, in particular on subnational governance, that require manual input. The method proposed in the paper can be easily extended to other areas with existing domain-specific knowledge bases, such as party manifestos, open-ended survey questions, social media data, and legal documents, in ways that can add knowledge to research programs.

READ FULL TEXT

page 14

page 21

research
09/20/2022

Twitter Topic Classification

Social media platforms host discussions about a wide variety of topics t...
research
05/28/2021

A Query-Driven Topic Model

Topic modeling is an unsupervised method for revealing the hidden semant...
research
02/03/2023

Analyzing the impact of climate change on critical infrastructure from the scientific literature: A weakly supervised NLP approach

Natural language processing (NLP) is a promising approach for analyzing ...
research
05/08/2017

Machine Learning with World Knowledge: The Position and Survey

Machine learning has become pervasive in multiple domains, impacting a w...
research
09/11/2017

Research Portfolio Analysis and Topic Prominence

Stakeholders in the science system need to decide where to place their b...
research
03/29/2019

Re-Ranking Words to Improve Interpretability of Automatically Generated Topics

Topics models, such as LDA, are widely used in Natural Language Processi...
research
06/15/2023

Domain-specific ChatBots for Science using Embeddings

Large language models (LLMs) have emerged as powerful machine-learning s...

Please sign up or login with your details

Forgot password? Click here to reset