Arabic aspect based sentiment analysis using BERT

by   Mohammed M. Abdelgwad, et al.

Aspect-based sentiment analysis(ABSA) is a textual analysis methodology that defines the polarity of opinions on certain aspects related to specific targets. The majority of research on ABSA is in English, with a small amount of work available in Arabic. Most previous Arabic research has relied on deep learning models that depend primarily on context-independent word embeddings (e.g.word2vec), where each word has a fixed representation independent of its context. This article explores the modeling capabilities of contextual embeddings from pre-trained language models, such as BERT, and making use of sentence pair input on Arabic ABSA tasks. In particular, we are building a simple but effective BERT-based neural baseline to handle this task. Our BERT architecture with a simple linear classification layer surpassed the state-of-the-art works, according to the experimental results on the benchmarked Arabic hotel reviews dataset.



There are no comments yet.


page 1

page 2

page 3

page 4


Exploiting BERT for End-to-End Aspect-based Sentiment Analysis

In this paper, we investigate the modeling power of contextualized embed...

Arabic aspect based sentiment analysis using bidirectional GRU based models

Aspect-based Sentiment analysis (ABSA) accomplishes a fine-grained analy...

LU-BZU at SemEval-2021 Task 2: Word2Vec and Lemma2Vec performance in Arabic Word-in-Context disambiguation

This paper presents a set of experiments to evaluate and compare between...

Negation Handling in Machine Learning-Based Sentiment Classification for Colloquial Arabic

One crucial aspect of sentiment analysis is negation handling, where the...

New Arabic Medical Dataset for Diseases Classification

The Arabic language suffers from a great shortage of datasets suitable f...

Advancing Humor-Focused Sentiment Analysis through Improved Contextualized Embeddings and Model Architecture

Humor is a natural and fundamental component of human interactions. When...

Improving Aspect-Level Sentiment Analysis with Aspect Extraction

Aspect-based sentiment analysis (ABSA), a popular research area in NLP h...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

ABSA is not a conventional sentiment analysis but a more difficult task, it is concerned with defining the aspect terms listed in a document, as well as the sentiments expressed against each aspect.
As demonstrated by (b0)

, sentiment analysis can be studied at three levels: the document level where the task is to identify sentiment polarities (positive, neutral, or negative) that is indicated throughout the entire document. The sentence level is concerned with classifying sentiments relevant to a single sentence. But the document contains many sentences and each sentence may contain multiple aspects with different sentiments, so the document and sentence level sentiment analysis may not be accurate and need another suitable type that makes this fine-grained analysis called ABSA.

ABSA was first launched on SemEval-2014 (b1), with the introduction of datasets containing annotated restaurant and laptop reviews. ABSA’s work was largely replicated at SemEval over the next two years (b2; b3)

as the task has extended into various domains, languages, and challenges. SemEval-2016 provided 39 datasets in 7 domains and 8 languages for the ABSA task, additionally, the datasets were provided with Support Vector Machine (SVM) as a baseline evaluation procedure.

There are three primary tasks common to ABSA, as mentioned in (b3); aspect category identification (Task 1), aspect opinion target extraction (Task 2), and aspect polarity detection (Task 3). In this paper, we concentrated only on Task 3.
Neural Network (NN) variations have been applied to many Arabic NLP applications such as Sentiment Analysis (b35), machine translation (b36)

, named entity recognition

(b37), and speech recognition (b38) and the highest results were obtained.

Using word embedding or distributed representations enhances neural network efficiency and improves the performance of DL models, therefore, it has been applied as a preliminary layer in various DL models.

Two types of word embeddings are available: contextualized and non contextualized word embeddings. Most of the research available on Arabic ABSA is based on non-contextualized word embedding such as (word2vec and fastText). The main drawback of non-contextualized word embeddings is the presentation of a set of static word embeddings that do not take into account the various contexts in which they may appear.

In contrast, pre-trained language models based on transformers such as BERT can provide dynamic embedding vectors which change by changing the context of words in the sentences, this made it more encouraging to use BERT in many tasks via fine-tuning on the downstream dataset related to the task.

Despite the fact that Arabic has a significant number of speakers (estimated to be around 422 million

(b4)) and is a morphologically rich language, the number of studies in Arabic aspect-based sentiment analysis is still restricted.
The following are the paper’s key contributions:

  • This paper examines the modeling competence of contextual embedding from pre-trained language models such as BERT with sentence pair input on Arabic aspect sentiment classification task.

  • A simple BERT based model with a linear classification layer was proposed to solve aspect sentiment polarity classification task. Experiments conducted on the Arabic hotel reviews dataset demonstrated that, despite the simplicity of our model, it surpassed the state-of-the-art works.

The rest of the paper is organized as follows. Section 2 addresses the related work; Section 3 illustrates the proposed model; Section 4 explains the dataset and the obtained results; finally, section 5 concludes the paper.

2 related work

ABSA is an area of SA with research methodologies divided into two approaches: standard machine learning techniques and DL-based techniques.

ABSA’s earliest efforts depended mainly on machine learning methods that mainly focus on handcrafted features like lexicons to train sentiment classifiers

(b12)(b13), while these methods are effective but rely heavily on the efficiency of handcrafted features. Subsequently, a set of methods based on neural networks consisting of a word embedding layer followed by a neural architecture were developed for the ABSA task and pretty results were achieved (b14; b15; b25).
Several attention-based models have been applied to the ABSA task for its ability to focus on important parts of the sentence related to aspects (b16) (b17), but a single attention network maybe not enough at capturing key dependency attributes between context and targets especially when the sentence is long, so multiple attention networks were proposed to solve this problem (b20; b21).
In order to resolve the question-answering problem, the authors of (b42) have developed the memory network concept (MemNN), which was later adopted in several NLP challenges including ABSA(b22; b23; b24).
Pre-trained language models have recently played a vital role in many NLP applications, as they can take advantage of the vast volume of unlabeled data to learn general language representations; Elmo (b27), GPT (b28), and BERT (b29) are among the most well-known examples. The authors of (b41) studied the use of BERT embeddings with several neural models such as a linear layer, GRU model, self-attention network, and conditional random field to deal with the ABSA task. In (b40) the authors have demonstrated that, treating ABSA as a sentence pair classification task by building auxiliary sentence as input to the BERT, significantly improved the results. The strength of BERT embeddings in ABSA was further investigated by (b30; b31; b32; b33).
In general, Arabic ABSA research development is slower than English. (b8) combined aspect embedding with each word embedding and sent the mixture to CNN model to address aspect polarity and category identification tasks. Subsequently, (b10) took advantage of modeling internal information related to the hierarchical review structure in solving the ABSA task.
Two supervised machine learning-based techniques, RNN and SVM supplemented with a set of handcrafted features, were suggested by (b34) and excellent results were achieved using SVM but RNN was faster in terms of training execution time. (b7) combined aspect embedding with each word embeddings to motivate learning the connections between context words and targets and send the mixture to LSTM, then applied the attention mechanism for focusing on context words related to specific aspects. (Abdelgwad u. a., 2021) proposed applying IAN network supported with Bi-GRU for extracting target and context representations in a better manner. (b9) proposed applying deep memory network based on a stack of IndyLSTM supplemented with recurrent attention network to solve aspect sentiment classification task.
In this paper, a simple BERT-based model with sentence pair input and a linear classification layer was proposed to solve the Arabic ABSA task and state-of-the-art results were achieved on the Arabic hotel reviews dataset.

3 model

BERT (Bidirectional Encoder Representations from Transformers) is a deep learning technique for natural language processing (NLP) in which deep neural networks use unsupervised language representation and bidirectional models built on Transformers (a deep learning algorithm in which each output element is linked to each input element and the weightings between them are dynamically determined based on their relation using the attention mechanism). BERT is pre-trained on two separate but related NLP tasks using the bidirectional capability: Masked Language Modeling and Next Sentence Prediction. BERT effectively handles ambiguity, which is the most difficult aspect of understanding natural language, and can reach high degree of accuracy in analyzing languages close to human beings. We first use the BERT portion with L transformer layers to measure the corresponding contextualized representations for the T-length input token sequence. These contextual representations are then fed to the task-specific layer to predict sentiment polarity labels. The overall architecture of the proposed model is depicted in Figure


Figure 1: Model overall architecture.

3.1 Auxiliary Sentence

Since the BERT model accepts a single or pair of sentences as input, and due to the ability and effectiveness of the BERT model in dealing with sentence pair classification tasks resulted from both the unsupervised masked language model and next sentence prediction tasks, the ABSA task can be transformed into a sentence-pair classification task using the pre-trained BERT model, with the first sentence containing words that express sentiments within the sentence, and the second sentence containing information related to the aspect (the auxiliary sentence). In other words, the model receives the review sentence as the first sentence and the aspect terms as an auxiliary sentence and the task would be to determine the sentiments towards each aspect.

3.2 BERT as Embedding layer

In comparison to the conventional word2vec embedding layer that offers static context-independent word vectors, the BERT layers offer dynamic context-dependent word embeddings by taking the entire sentence as input then calculating the representation of each word by extracting information from the entire sentence.
Inputs are processed in a special way by BERT, where sentences are tokenized first, as usual in every model, additionally, extra tokens are inserted at the start [CLS] and end [SEP] of the tokenized sentence. Then due to the utilization of self attention mechanism that enables BERT models to process tokens in parallel, and to deal with the next sentence prediction task, some special embedding tokens must be added to include all necessary information.
The tokenized sentence with [CLS] and [SEP] tokens are first fed into the embedding layer, which results in tokens embeddings, those tokens embeddings don’t include position information which added by means of position embeddings. Finally, it must be determined whether each token is associated with sentence A or B, this is possible by creating a new fixed token known as a segment embedding.
Then combine the token embeddings, segment embeddings, and position embeddings for each token and feed the mixture into transformer layers to optimize token level feature. The output representation from the transformer layer is fed as input into the next layer where . The representation at the l-th layer is calculated as follows :


where refers to the number of input tokens. Outputs from the last transformer layer are considered as full contextual representations for the input tokens and are used as input to the task specific layer.

3.3 Design of Downstream Model

In order to identify sentiment polarities toward aspects (i.e task 3), word embeddings extracted from the BERT model are fed into a task-specific layer, a simple linear layer in our case, where both the input and the weight (the learnable parameters) matrices are multiplied with the addition of the bias term to transform their incoming features to output features in a linear manner. The softmax function is then used to determine the likelihood of each category .


4 experiments

4.1 Data and baseline research

We conducted our experiments on the Arabic Hotel Reviews Dataset, which was presented in SemEval-2016 in support of ABSA’s multilingual task involving work in 8 languages and 7 domains (b3). There are 19,226 training tuples and 4802 testing tuples in the dataset. The XML schema was used to annotate the dataset. The dataset consists of a set of reviews, each review contains a number of sentences, with each sentence having three tuples: aspect-category, OTE, and aspect polarity. Figure 2 depicts an XML snapshot that corresponds to an annotated review.

Figure 2: Example of the Arabic Hotels Dataset Schema.

The dataset supports both text-level annotations (2291 reviews) and sentence-level annotations (6029 sentences). This study focused only on sentence-level tasks. The dataset’s scale and distribution are explained in Table 1

. Also, an SVM classifier supported with N-gram features was applied to the Arabic hotel review dataset for various ABSA tasks and was considered as baseline research to compare with.

text sentence tuples text sentence tuples
T1: Sentence-level ABSA 1839 4802 10.509 452 1227 2604
T2: Text-level ABSA 1839 4802 8757 452 1227 2158
Table 1: The dataset’s scale and distribution.

4.2 Evaluation method

In order to determine the effectiveness of the proposed model, the accuracy metric was adopted, which was defined as follows:


Accuracy measures the number of correct samples to all samples, higher accuracy indicates better performance.

4.3 Hyperparameters Setting

The pretrained ”Arabic BERT” (b39)

was used, which was previously trained on about 8.2 billion words of MSA and dialectical Arabic. The BERT-Base model consisting of 12 hidden layers, 12 attention heads, and hidden size of 768 has been particularly used. Adam optimizer was used to fine-tune the model on the downstream task with a learning rate of 1e-5, dropout rate of 0.1, batch size of 24, and number of epochs equal to 10.

4.4 comparison models

Baseline trained SVM classifier enhanced with N-grams features (b3).
INSIGHT-1 combined aspect embedding with each word embedding and fed the resulting mixture to CNN for Aspect sentiment analysis (b8).
HB-LSTM developed hierarchical bidirectional LSTM for ABSA, that can take advantage of hierarchical modeling information of the review in improving performance (b10).
AB-LSTM-PC combined aspect embedding with each word embeddings to motivate learning the connections between context words and targets, then applied the attention mechanism for focusing on context words related to specific aspects (b7).

Used Bi-GRU to extract hidden representations from targets and context, then applied two associated attention networks on those representations to model targets and their context in an interactive manner (Abdelgwad u. a., 2021).

MBRA made use of external memory network containing a stack of bidirectional lndy-lstms consisted of 3 layers, and a recurrent attention mechanism to deal with complex sentence structures (b9).

4.5 main results

Table 2 shows that simply adding a basic linear layer on top of BERT, outperformed the baseline and achieved better results than many previous Arabic DL models. This is evidence of the effectiveness of Bert’s contextual representations at encoding associations between aspect terms and context words. Moreover, the use of the auxiliary sentence further improved the results of the BERT model, which is apparent in the higher results of the BERT-pair model compared to the BERT-single, achieving state-of-the-art results.

Model Accuracy
Baseline 76.4
AB-LSTM-PC 82.60
INSIGHT-1 (CNN) 82.71
IAN-BGRU 83.98
MBRA 87.31
BERT-Linear-single  85.93
BERT-Linear-pair 89.51
Table 2: Models Accuracy Results on T3

4.6 Overfitting Issue

Despite the use of Bert-base model ”the smallest prerained version of Bert”, the number of parameters seemed to be large (110M) for this task, which made us wonder: Is our model overfitting the downstream datat ? so we trained the BERT-linear model on the Arabic hotel reviews dataset for 10 epochs and noticed the oscillating accuracy results on the development set after each epoch. As indicated in Figure 3, the accuracy results of the development set are relatively stable and do not decrease significantly as the training progresses, which reveals that the BERT model is extremely robust to overfitting.

Figure 3: Performances on the Dev set.

4.7 finetuning or not

We investigated the effect of fine-tuning on the final results by keeping the parameters of the BERT component fixed during the training phase. Figure 4 shows a simple comparison between the performance of the model when fine-tuning and when setting the parameters fixed. The general purpose BERT representation is obviously far from acceptable for the downstream task, and task-specific fine-tunning is necessary to use BERT’s capabilities to increase performance.

Figure 4: Effect of fine-tuning BERT.

5 Conclusion

In this paper, we explored the modeling capabilities of contextual embeddings from the pre-trained BERT model with the benefit of sentence pair input on the Arabic ABSA task. specifically, we examined the incorporation of the BERT embedding component with a simple linear classification layer and extensive experiments were performed on the Arabic hotel review dataset. The experimental results show that despite the simplicity of our model, it surpassed the state-of-the-art works, and is robust to overfitting.