hep-th

06/27/2018
by   Yang-Hui He, et al.
0

We apply techniques in natural language processing, computational linguistics, and machine-learning to investigate papers in hep-th and four related sections of the arXiv: hep-ph, hep-lat, gr-qc, and math-ph. All of the titles of papers in each of these sections, from the inception of the arXiv until the end of 2017, are extracted and treated as a corpus which we use to train the neural network Word2Vec. A comparative study of common n-grams, linear syntactical identities, word cloud and word similarities is carried out. We find notable scientific and sociological differences between the fields. In conjunction with support vector machines, we also show that the syntactic structure of the titles in different sub-fields of high energy and mathematical physics are sufficiently different that a neural network can perform a binary classification of formal versus phenomenological sections with 87.1 and can perform a finer five-fold classification across all sections with 65.1 accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2017

Efficient Representation for Natural Language Processing via Kernelized Hashcodes

Kernel similarity functions have been successfully applied in classifica...
research
08/08/2022

Gradient Flows for L2 Support Vector Machine Training

We explore the merits of training of support vector machines for binary ...
research
01/16/2021

Towards Searching Efficient and Accurate Neural Network Architectures in Binary Classification Problems

In recent years, deep neural networks have had great success in machine ...
research
03/07/2019

Predicting Research Trends From Arxiv

We perform trend detection on two datasets of Arxiv papers, derived from...
research
05/24/2023

The ACL OCL Corpus: advancing Open science in Computational Linguistics

We present a scholarly corpus from the ACL Anthology to assist Open scie...

Please sign up or login with your details

Forgot password? Click here to reset