Automated essay scoring with string kernels and word embeddings

04/21/2018
by   Mădălina Cozma, et al.
0

In this work, we present an approach based on combining string kernels and word embeddings for automatic essay scoring. String kernels capture the similarity among strings based on counting common character n-grams, which are a low-level yet powerful type of feature, demonstrating state-of-the-art results in various text classification tasks such as Arabic dialect identification or native language identification. To our best knowledge, we are the first to apply string kernels to automatically score essays. We are also the first to combine them with a high-level semantic feature representation, namely the bag-of-super-word-embeddings. We report the best performance on the Automated Student Assessment Prize data set, in both in-domain and cross-domain settings, surpassing recent state-of-the-art deep learning approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2018

Transductive Learning with String Kernels for Cross-Domain Text Classification

For many text classification tasks, there is a major problem posed by th...
research
08/25/2018

Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set

Recently, string kernels have obtained state-of-the-art results in vario...
research
10/07/2020

Combining Deep Learning and String Kernels for the Localization of Swiss German Tweets

In this work, we introduce the methods proposed by the UnibucKernel team...
research
05/17/2017

Utility of general and specific word embeddings for classifying translational stages of research

Conventional text classification models make a bag-of-words assumption r...
research
01/11/2021

Clustering Word Embeddings with Self-Organizing Maps. Application on LaRoSeDa – A Large Romanian Sentiment Data Set

Romanian is one of the understudied languages in computational linguisti...
research
07/26/2017

Can string kernels pass the test of time in Native Language Identification?

We describe a machine learning approach for the 2017 shared task on Nati...
research
02/02/2022

Automated Detection of Doxing on Twitter

Doxing refers to the practice of disclosing sensitive personal informati...

Please sign up or login with your details

Forgot password? Click here to reset