Text classification in shipping industry using unsupervised models and Transformer based supervised models

12/21/2022
by   Ying Xie, et al.
0

Obtaining labelled data in a particular context could be expensive and time consuming. Although different algorithms, including unsupervised learning, semi-supervised learning, self-learning have been adopted, the performance of text classification varies with context. Given the lack of labelled dataset, we proposed a novel and simple unsupervised text classification model to classify cargo content in international shipping industry using the Standard International Trade Classification (SITC) codes. Our method stems from representing words using pretrained Glove Word Embeddings and finding the most likely label using Cosine Similarity. To compare unsupervised text classification model with supervised classification, we also applied several Transformer models to classify cargo content. Due to lack of training data, the SITC numerical codes and the corresponding textual descriptions were used as training data. A small number of manually labelled cargo content data was used to evaluate the classification performances of the unsupervised classification and the Transformer based supervised classification. The comparison reveals that unsupervised classification significantly outperforms Transformer based supervised classification even after increasing the size of the training dataset by 30 learning models (such as Transformers) from successful practical applications. Unsupervised classification can provide an alternative efficient and effective method to classify text when there is scarce training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2019

Delta-training: Simple Semi-Supervised Text Classification using Pretrained Word Embeddings

We propose a novel and simple method for semi-supervised text classifica...
research
08/11/2023

Weakly Supervised Text Classification on Free Text Comments in Patient-Reported Outcome Measures

Free text comments (FTC) in patient-reported outcome measures (PROMs) da...
research
02/08/2023

CRL+: A Novel Semi-Supervised Deep Active Contrastive Representation Learning-Based Text Classification Model for Insurance Data

Financial sector and especially the insurance industry collect vast volu...
research
06/05/2022

Performance Comparison of Simple Transformer and Res-CNN-BiLSTM for Cyberbullying Classification

The task of text classification using Bidirectional based LSTM architect...
research
11/14/2020

Supervised Text Classification using Text Search

Supervised text classification is a classical and active area of ML rese...
research
04/28/2020

Learning Interpretable and Discrete Representations with Adversarial Training for Unsupervised Text Classification

Learning continuous representations from unlabeled textual data has been...
research
09/10/2019

Spam filtering on forums: A synthetic oversampling based approach for imbalanced data classification

Forums play an important role in providing a platform for community inte...

Please sign up or login with your details

Forgot password? Click here to reset