Pre-trained language models (LMs) Peters et al. (2018); Radford et al. ; Radford et al. (2019); Devlin et al. (2018) aim to learn general (or mixed-domain) knowledge for end tasks. Recent studies Xu et al. (2019); Gururangan et al. (2020) show that learning domain-specific LMs are equally important because general-purpose LMs lack enough focus on domain details. This is partially because the training corpus of general LMs is out-of-domain for domain end tasks and, more importantly, because mixed-domain weights may not capture the long-tailed and under represented domain details Xu et al. (2018) (see Section 4). An intuitive example can be found in Table 1, where all masked words sky, water, idea, screen and picture can appear in a mixed-domain corpus. A general-purpose LM may favor frequent examples and ignore long-tailed choices in certain domains.
|The [MASK] is clear .|
|The sky is clear .||Astronomy [Irrelevant Domain]|
|The water is clear .||Liquids [Irrelevant Domain]|
|The idea is clear .||Concepts [Irrelevant Domain]|
|The screen is clear .||Desktop [Relevant Domain]|
|The picture is clear .||Laptop [Target Domain]|
In contrast, although domain-specific LMs can capture fine-grained domain details, they may suffer from insufficient training corpus Gururangan et al. (2020) to strengthen general knowledge within a domain. To this end, we propose a domain-oriented learning task that aims to combine the benefits of both general and domain-specific world:
Domain-oriented Learning: Given a target domain and a set of diverse source domains , perform (language model) learning that focusing on and all its relevant domains in .
This learning task resolves the issues that exist in both general and domain-specific worlds. On one hand, the training of LM does not need to focus on unrelated domains anymore (e.g., Books is one big domain in Amazon but a major focus on Books may not be very helpful for end tasks in Laptop); on the other hand, although an in-domain corpus may be limited, other relevant domains can share a great amount of knowledge (e.g., Desktop in Table 1) to make in-domain corpus more diverse and general.
This paper proposes an extremely simple extension of BERT Devlin et al. (2018)
called DomBERT to learn domain-oriented language models. DomBERT divides a mixed-domain corpus by domain tags and learns to re-balance the training examples for the target domain. Similar to other LMs, we categorize DomBERT as a self-supervised learning model because domain tags naturally exist online and do not require human annotations for a specific task222Supervised learning needs extra human annotations., ranging from Wikipedia, news articles, blog posts, QAs, to customer reviews. DomBERT simultaneously learns masked language modeling and discovers relevant domains to draw training examples, where the later are computed from domain embeddings learned from an auxiliary task of domain classification. We apply DomBERT to end tasks in aspect-based sentiment analysis (ABSA) in low-resource settings, demonstrating promising results.
The main contributions of this paper are in 3-fold:
We propose the task of domain-oriented learning, which aims to learn language models focusing on a target and its relevant domains.
We propose DomBERT, which is an extension of BERT with the capability to draw examples from relevant domains from a pool of diverse domains.
Experimental results demonstrate that DomBERT is promising in low-resource settings for aspect-based sentiment analysis.
2 Related Work
Pre-trained language models gain significant improvements over a wide spectrum of NLP tasks, including ELMoPeters et al. (2018), GPT/GPT2Radford et al. ; Radford et al. (2019), BERTDevlin et al. (2018), XLNetYang et al. (2019), RoBERTaLiu et al. (2019), ALBERTLan et al. (2019), ELECTRAClark et al. (2019). This paper extends BERT’s masked language model (MLM) with domain knowledge learning. Following RoBERTa, the proposed DomBERT leverages dynamic masking, removes the next sentence prediction (NSP) task (which is proved to have negative effects on pre-trained parameters), and allows for max-length MLM to fully utilize the computational power. This paper also borrows ALBERT’s removal of dropout since pre-trained LM, in general, is an underfitting task that requires more parameters instead of avoiding overfitting.
The proposed domain-oriented learning task can be viewed as one type of transfer learningPan and Yang (2009), which learns a transfer strategy implicitly that transfer training examples from relevant (source) domains to the target domain. This transfer process is conducted throughout the training process of DomBERT.
The experiment of this paper focuses on aspect-based sentiment analysis (ABSA), which typically requires a lot of domain-specific knowledge. Reviews serve as a rich resource for sentiment analysis Pang et al. (2002); Hu and Liu (2004); Liu (2012, 2015). ABSA aims to turn unstructured reviews into structured fine-grained aspects (such as the “battery” or aspect category of a laptop) and their associated opinions (e.g., “good battery” is positive about the aspect battery). This paper focuses on three (3) popular tasks in ABSA: aspect extraction (AE), aspect sentiment classification (ASC) Hu and Liu (2004) and end-to-end ABSA (E2E-ABSA) Li et al. (2019b, c). AE aims to extract aspects (e.g., “battery”), ASC identifies the polarity for a given aspect (e.g., positive for battery) and E2E-ABSA is a combination of AE and ASC that detects the aspects and their associated polarities simultaneously.
AE and ASC are two important tasks in sentiment analysis Pang et al. (2002); Liu (2015). It is different from document or sentence-level sentiment classification (SSC) Pang et al. (2002); Kim (2014); He and Zhou (2011); He et al. (2011) as it focuses on fine-grained opinion on each specific aspect Shu et al. (2017); Xu et al. (2018). It is either studied as a single task or a joint learning end-to-end task together with aspect extraction Wang et al. (2017); Li and Lam (2017); Li et al. (2019a)
. Most recent works widely use neural networksDong et al. (2014); Nguyen and Shirai (2015); Li et al. (2018a). For example, memory network Weston et al. (2014); Sukhbaatar et al. (2015) and attention mechanisms are extensively applied to ASC Tang et al. (2016); Wang et al. (2016a, b); Ma et al. (2017); Chen et al. (2017); Ma et al. (2017); Tay et al. (2018); He et al. (2018a); Liu et al. (2018b). ASC is also studied in transfer learning or domain adaptation settings, such as leveraging large-scale corpora that are unlabeled or weakly labeled (e.g., using the overall rating of a review as the label) Xu et al. (2019); He et al. (2018b) and transferring from other tasks/domains Li et al. (2018b); Wang et al. (2018a, b)
. Many of these models use handcrafted features, graph structures, lexicons, and complicated neural network architectures to remedy the insufficient training examples from both tasks. Although these approaches may achieve better performances by manually injecting human knowledge into the model, this paper aims to improve ABSA from leveraging unlabeled data via a self-supervised language modeling mannerXu et al. (2018); He et al. (2018c); Xu et al. (2019).
This section presents DomBERT, which is an extension of BERT for domain knowledge learning. We adopt post-training of BERT Xu et al. (2019) instead of training DomBERT from scratch since post-training on BERT is more efficient333We aim for single GPU training for all models in this paper and use uncased given its lower costs of training for academic purpose.. Different from Xu et al. (2019), the main goal of domain-oriented training is that it leverages both an in-domain corpus and a pool of corpora of source domains.
The goal of DomBERT is to discover relevant domains from the pool of source domains and uses the training examples from relevant source domains for masked language model learning. As a result, DomBERT has a sampling process over a categorical distribution on all domains (including the target domain) to retrieve relevant domains’ examples. Learning such a distribution needs to detect the domain similarities between all source domains and the target domain. DomBERT learns an embedding for each domain and computes such similarities. The domain embeddings are learned from an auxiliary task called domain classification.
3.1 Domain Classification
Given a pool of source and target domains, one can easily form a classification task on domain tags. As such, each text document has its domain label . Following RoBERTaLiu et al. (2019)’s max-length training examples, we pack different texts from the same domain up to the maximum length into a single training example.
We let the number of source domains be . Then the total number of domains (including the target domain) is . Let denote the hidden state of the [CLS] token of BERT, which indicates the document-level representations of one example. We first pass this hidden states into a dense layer to reduce the size of hidden states. Then we pass this reduced hidden states to a dense layer
to compute the logits over all domains:
where is the size of the dense layer, , and are trainable weights. Besides a dense layer, is essentially a concatenation of domain embeddings: . Then we apply cross-entropy loss to the logits and label to obtain the loss of domain classification.
To encourage the diversity of domain embeddings, we further compute a regularizer among domain embeddings as following:
Minimizing this regularizer encourages the learned embeddings to be more orthogonal (thus diverse) to each other. Finally, we add the loss of domain classification, BERT’s masked language model and regularizer together:
where controls the ratio of losses between masked language model and domain classification.
3.2 Domain Sampler
As a side product of domain classification, DomBERT has a built-in data sampling process to draw examples from both the target domain and relevant domains for future learning. This process follows a unified categorical distribution over all domains, which ensures a good amount of examples from both the target domains and relevant domains are sampled. As such, it is important to always have the target domain
with the highest probability for sampling.
To this end, we use cosine similarity as the similarity function, which has the property to always let. For an arbitrary domain , the probability of domain being sampled is computed from a softmax function over domain similarities as following:
where is the temperature Hinton et al. (2015) to control the importance of highly-ranked domains vs long-tailed domains.
To form a mini-batch for the next training step, we sample domains following the categorical distribution of up to the batch size and retrieve the next available example from each sampled domain. As such, we maintain a randomly shuffled queue of examples for each domain. When the examples of one domain are exhausted, a new randomly shuffled queue will be generated for that domain. As a result, we implement a data sampler that takes as inputs.
3.3 Implementation Details
We adopt the popular transformers framework from hugging face444https://huggingface.co/transformers/, with the following minor improvements.
Early Apply of Labels:
We refactor the forward computation of BERT MLM by a method called to early apply of labels (EAL), which leverages labels of MLM in an early stage of forwarding computation to avoid computation for invalid positions.
Although MLM just uses 15% of tokens for prediction, the implementations of BERT MLM still computes the logits over the vocabulary for all positions,
which is a big waste of both GPU computation and memory footprint (because it is expensive to multiply hidden states with word embeddings of vocabulary size).
EAL only uses positions that need prediction when computing logits for each token555We use torch.masked_select in PyTorch.
in PyTorch.. This improves the speed of training to 3.2 per second from 2.2 per second for . A similar method can be applied to compute the cross-entropy based loss for one token (because only the logit for the ground-truth token contributes to the loss), which is potentially useful for almost all tasks with large vocabulary size.
Dropout Removal: Following ALBERTLan et al. (2019), we turn off dropout for post-training because BERT is unlikely to overfit to large training corpus. This gives both a larger capacity of parameters and faster training speed. The dropout is turned back on during end-task fine-tuning because BERT is typically over-parameterized for end-tasks.
4.1 End Task Datasets
We apply DomBERT to end tasks in aspect-based sentiment analysis from the SemEval dataset, which focusing on Laptop, Restaurant. Statistics of datasets for AE, ASC and E2E-ABSA are given in Table 2, 3 and LABEL:tbl:e2e, respectively. For AE, we choose SemEval 2014 Task 4 for laptop and SemEval-2016 Task 5 for restaurant to be consistent with Xu et al. (2018) and other previous works. For ASC, we use SemEval 2014 Task 4 for both laptop and restaurant as existing research frequently uses this version. We use 150 examples from the training set of all these datasets for validation. For E2E-ABSA, we adopt the formulation of Li et al. (2019b) where the laptop data is from SemEval-2014 task 4 and the restaurant domain is a combination of SemEval 2014-2016.
4.2 Domain Corpus
Based on the domains of end tasks from SemEval dataset, we explore the capability of the large-scale unlabeled corpus from Amazon review datasetsHe and McAuley (2016) and Yelp dataset666https://www.yelp.com/dataset/challenge, 2019 version.. Following Xu et al. (2019), we select all laptop reviews from the electronics department. This ends with about 100 MB corpus. Similarly, we simulate a low-resource setting for restaurants and randomly select about 100 MB reviews tagged as Restaurants as their first category from Yelp reviews. For source domains , we choose all reviews from the 5-core version of Amazon review datasets and all Yelp reviews excluding Laptop and Restaurants. Note that Yelp is not solely about restaurants but has other location-based domains such as car service, bank, theatre etc. This ends with a total of domains, and are source domains. The total size of the corpus is about 20 GB. The number of examples for each domain is plotted in Figure 1, where the distribution of domains is heavily long-tailed.
We adopt (uncased) as the basis for all experiments due to the limits of computational power in our academic setting and the purpose of making reproducible research. We choose the hidden size of domain embeddings to ensure the regularizer term in the loss doesn’t consume too much GPU memory. We choose and . We leverage FP16 computation777https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html
to reduce the actual size of tensors on GPU and speed up the training. We train with FP16-O2 optimization, which has faster speed and smaller GPU memory footprint compared to O1 optimization. Due to the uncertainty from the online sampling of DomBERT, we assume the number of training examples per epoch as the number of examples in the target domains. As a result, we train DomBERT for 400 epochs to get enough training examples from relevant domains. The full batch size is set to 288 (a multiplication of batch size of 24 and gradient accumulation step 12). The maximum length of DomBERT is consistent with BERT as 512. We use AdamaxKingma and Ba (2014) as the optimizer. Lastly. the learning rate is to be 5e-5.
4.4 Compared Methods
We compare DomBERT with LM-based baselines (that requires no extra supervision from humans such as parsing, fine-grained annotation).
BERT this is the vanilla pre-trained model from Devlin et al. (2018), which is used to show the performance of BERT without any domain adaption.
BERT-Review post-train BERT on all (mixed-domain) Amazon review datasets and Yelp datasets in a similar way of training BERT. Following Liu et al. (2019), we train the whole corpus for 4 epochs, which took about 10 days of training (much longer than DomBERT).
BERT-DK is a baseline borrowed from Xu et al. (2019) that trains an LM per domain. Note that the restaurant domain is trained from 1G of corpus that aligns well with the types of restaurants in SemEval, which is not a low-resource case. We use this baseline to show that DomBERT can reach competitive performance.
DomBERT is the model proposed in this paper.
4.5 Evaluation Metrics
For AE, we use the standard evaluation scripts come with the SemEval datasets and report the F1 score. For ASC, we compute both accuracy and Macro-F1 over 3 classes of polarities, where Macro-F1 is the major metric as the imbalanced classes introduce biases on accuracy. To be consistent with existing research Tang et al. (2016), examples belonging to the conflict polarity are dropped due to a very small number of examples. For E2E-ABSA, we adopt the evaluation script fromLi et al. (2019b), which reports precision, recall, and F1 score on sequence labeling (of combined aspect labels and sentiment polarity).
Results are reported as averages of 10 runs (10 different random seeds for random batch generation).888 We notice that adopting 5 runs used by existing researches still has a high variance for a fair comparison.
We notice that adopting 5 runs used by existing researches still has a high variance for a fair comparison.
4.6 Result Analysis and Discussion
Results on different tasks in ABSA exhibit different challenges.
AE: In Table 5 We notice that AE is a very domain-specific task. DomBERT further improves the performance of BERT-DK that only uses domain-specific corpus. Note that BERT-DK for restaurant uses 1G of restaurant corpus. But DomBERT’s target domain corpus is just 100 MB. So DomBERT further learns domain-specific knowledge from relevant domains. Although Yelp data contain a great portion of restaurant reviews, a mixed-domain training as BERT-Review does not yield enough domain-specific knowledge.
ASC: ASC is a more domain agnostic task because most of sentiment words are sharable across all domains (e.g., “good” and “bad”). As such, in Table 6, we notice ASC for restaurant is more domain-specific than laptop. DomBERT is worse than BERT-Review in laptop because a 20+ G can learn general-purpose sentiment better. BERT-DK is better than DomBERT because a much larger in-domain corpus is more important for performance.
E2E ABSA: By combining AE and ASC together, E2E ABSA exhibit more domain-specifity, as shown in Table 7. In this case, we can see the full performance of DomBERT because it can learn both general and domain-specific knowledge well. BERT-Review is poor probably because it focuses on irrelevant domains such as Books.
We further examine the sampling process of DomBERT. In Table 8, we report top-20 source domains reported by the data sampler at the end of training. The results are closer to our intuition because most domains are very close to laptop and restaurant, respectively.
|BERTDevlin et al. (2018)||79.28||74.1|
|BERT-DKXu et al. (2019)||83.55||77.02|
|BERTDevlin et al. (2018)||75.29||71.91||81.54||71.94|
|BERT-DKXu et al. (2019)||77.01||73.72||83.96||75.45|
|Li et al. (2019b)||61.27||54.89||57.90||68.64||71.01||69.80|
|Luo et al. (2019)||-||-||60.35||-||-||72.78|
|He et al.||-||-||58.37||-||-||-|
|Lample et al. (2016)||58.61||50.47||54.24||66.10||66.30||66.20|
|Ma and Hovy (2016)||58.66||51.26||54.71||61.56||67.26||64.29|
|Liu et al. (2018a)||53.31||59.40||56.19||68.46||64.43||66.38|
|BERT+LinearLi et al. (2019c)||62.16||58.90||60.43||71.42||75.25||73.22|
|BERTDevlin et al. (2018)||61.97||58.52||60.11||68.86||73.00||70.78|
|BERT-DKXu et al. (2019)||63.95||61.18||62.45||71.88||74.07||72.88|
|Boot Shop (Men)||Coffee & Tea|
|Laptop & Netbook Computer Accessories||Bakeries|
|Computers & Accessories||Bars|
|Electronics Warranties||Arts & Entertainment|
|Antivirus & Security||Venues & Event Spaces|
|Unlocked Cell Phones||Dance Clubs|
|Power Strips||Tea Rooms|
|No-Contract Cell Phones||Event Planning & Services|
|Video Games/PC/Accessories||Sports Bars|
|MP3 Players & Accessories||Desserts|
This paper investigates the task of domain-oriented learning for language modeling. It aims to leverage the benefits of both large-scale mixed-domain training and in-domain specific knowledge learning. We propose a simple extension of BERT called DomBERT, which automatically exploits the power of training corpus from relevant domains for a target domain. Experimental results demonstrate that the DomBERT is promising in a wide assortment of tasks in aspect-based sentiment analysis.
Recurrent attention network on memory for aspect sentiment analysis.
Proceedings of the 2017 conference on empirical methods in natural language processing, pp. 452–461. Cited by: §2.
- ELECTRA: pre-training text encoders as discriminators rather than generators. In International Conference on Learning Representations, Cited by: §2.
- BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §1, §1, §2, §4.4, Table 5, Table 6, Table 7.
- Adaptive recursive neural network for target-dependent twitter sentiment classification. In Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 2: Short papers), Vol. 2, pp. 49–54. Cited by: §2.
- Don’t stop pretraining: adapt language models to domains and tasks. In Proceedings of ACL, Cited by: §1, §1.
-  An interactive multi-task learning network for end-to-end aspect-based sentiment analysis. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Cited by: Table 7.
Effective attention modeling for aspect-level sentiment classification. In Proceedings of the 27th International Conference on Computational Linguistics, pp. 1121–1131. Cited by: §2.
- Exploiting document knowledge for aspect-level sentiment classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Cited by: §2.
- Exploiting document knowledge for aspect-level sentiment classification. arXiv preprint arXiv:1806.04346. Cited by: §2.
- Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In World Wide Web, Cited by: §4.2.
- Automatically extracting polarity-bearing topics for cross-domain sentiment classification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 123–131. Cited by: §2.
- Self-training from labeled features for sentiment analysis. Information Processing & Management 47 (4), pp. 606–616. Cited by: §2.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. Cited by: §3.2.
- Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 168–177. Cited by: §2.
- Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Cited by: §2.
- Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.3.
Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp. 260–270. External Links: Cited by: Table 7.
- ALBERT: a lite bert for self-supervised learning of language representations. In International Conference on Learning Representations, Cited by: §2, §3.3.
- Transformation networks for target-oriented sentiment classification. arXiv preprint arXiv:1805.01086. Cited by: §2.
A unified model for opinion target extraction and target sentiment prediction.
Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 6714–6721. Cited by: §2.
- A unified model for opinion target extraction and target sentiment prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 6714–6721. Cited by: §2, §4.1, §4.5, Table 7.
- Exploiting bert for end-to-end aspect-based sentiment analysis. arXiv preprint arXiv:1910.00883. Cited by: §2, Table 7.
- Deep multi-task learning for aspect term extraction with memory interaction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2886–2892. Cited by: §2.
- Exploiting coarse-to-fine task transfer for aspect-level sentiment classification. arXiv preprint arXiv:1811.10999. Cited by: §2.
- Sentiment analysis and opinion mining. Synthesis lectures on human language technologies 5 (1), pp. 1–167. Cited by: §2.
- Sentiment analysis: mining opinions, sentiments, and emotions. Cambridge University Press. Cited by: §2, §2.
- Empower Sequence Labeling with Task-Aware Neural Language Model. In AAAI, Cited by: Table 7.
- Content attention model for aspect based sentiment analysis. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 1023–1032. Cited by: §2.
- RoBERTa: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. Cited by: §2, §3.1, §4.4.
- DOER: dual cross-shared RNN for aspect term-polarity co-extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 591–601. External Links: Cited by: Table 7.
- Interactive attention networks for aspect-level sentiment classification. arXiv preprint arXiv:1709.00893. Cited by: §2.
- End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 1064–1074. External Links: Cited by: Table 7.
- PhraseRNN: phrase recursive neural network for aspect-based sentiment analysis. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 2509–2514. External Links: Cited by: §2.
- A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22 (10), pp. 1345–1359. Cited by: §2.
Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pp. 79–86. Cited by: §2, §2.
- Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237. Cited by: §1, §2.
-  Improving language understanding by generative pre-training. Cited by: §1, §2.
- Language models are unsupervised multitask learners. OpenAI Blog 1 (8), pp. 9. Cited by: §1, §2.
- Lifelong learning CRF for supervised aspect extraction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vancouver, Canada, pp. 148–154. External Links: Cited by: §2.
- End-to-end memory networks. In Advances in neural information processing systems, pp. 2440–2448. Cited by: §2.
- Aspect level sentiment classification with deep memory network. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 214–224. External Links: Cited by: §2, §4.5.
- Learning to attend via word-aspect associative fusion for aspect-based sentiment analysis. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §2.
- Lifelong learning memory networks for aspect sentiment classification. In 2018 IEEE International Conference on Big Data (Big Data), pp. 861–870. Cited by: §2.
- Target-sensitive memory networks for aspect sentiment classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 957–967. Cited by: §2.
- Recursive neural conditional random fields for aspect-based sentiment analysis. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 616–626. Cited by: §2.
- Coupled multi-layer attentions for co-extraction of aspect and opinion terms. In Thirty-First AAAI Conference on Artificial Intelligence, Cited by: §2.
- Attention-based lstm for aspect-level sentiment classification. In Proceedings of the 2016 conference on empirical methods in natural language processing, pp. 606–615. Cited by: §2.
- Memory networks. arXiv preprint arXiv:1410.3916. Cited by: §2.
- Double embeddings and CNN-based sequence labeling for aspect extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia, pp. 592–598. External Links: Cited by: §2.
- Double embeddings and cnn-based sequence labeling for aspect extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Cited by: §1, §2, §4.1.
- BERT post-training for review reading comprehension and aspect-based sentiment analysis. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Cited by: §1, §2, §3, §4.2, §4.4, Table 5, Table 6, Table 7.
- Xlnet: generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pp. 5754–5764. Cited by: §2.