Predicting Role Relevance with Minimal Domain Expertise in a Financial Domain

04/19/2017
by   Mayank Kejriwal, et al.
0

Word embeddings have made enormous inroads in recent years in a wide variety of text mining applications. In this paper, we explore a word embedding-based architecture for predicting the relevance of a role between two financial entities within the context of natural language sentences. In this extended abstract, we propose a pooled approach that uses a collection of sentences to train word embeddings using the skip-gram word2vec architecture. We use the word embeddings to obtain context vectors that are assigned one or more labels based on manual annotations. We train a machine learning classifier using the labeled context vectors, and use the trained classifier to predict contextual role relevance on test data. Our approach serves as a good minimal-expertise baseline for the task as it is simple and intuitive, uses open-source modules, requires little feature crafting effort and performs well across roles.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2019

Better Word Embeddings by Disentangling Contextual n-Gram Information

Pre-trained word vectors are ubiquitous in Natural Language Processing a...
research
03/24/2016

Part-of-Speech Relevance Weights for Learning Word Embeddings

This paper proposes a model to learn word embeddings with weighted conte...
research
02/01/2019

A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings

Learning word embeddings has received a significant amount of attention ...
research
07/22/2020

IITK at the FinSim Task: Hypernym Detection in Financial Domain via Context-Free and Contextualized Word Embeddings

In this paper, we present our approaches for the FinSim 2020 shared task...
research
03/07/2021

A Comparison of Word2Vec, HMM2Vec, and PCA2Vec for Malware Classification

Word embeddings are often used in natural language processing as a means...
research
01/05/2016

The Role of Context Types and Dimensionality in Learning Word Embeddings

We provide the first extensive evaluation of how using different types o...
research
11/19/2018

The Mafiascum Dataset: A Large Text Corpus for Deception Detection

Detecting deception in natural language has a wide variety of applicatio...

Please sign up or login with your details

Forgot password? Click here to reset