Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec?

09/14/2023
by   Anton Klenitskiy, et al.
0

Recently sequential recommendations and next-item prediction task has become increasingly popular in the field of recommender systems. Currently, two state-of-the-art baselines are Transformer-based models SASRec and BERT4Rec. Over the past few years, there have been quite a few publications comparing these two algorithms and proposing new state-of-the-art models. In most of the publications, BERT4Rec achieves better performance than SASRec. But BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In our work, we show that if both models are trained with the same loss, which is used by BERT4Rec, then SASRec will significantly outperform BERT4Rec both in terms of quality and training speed. In addition, we show that SASRec could be effectively trained with negative sampling and still outperform BERT4Rec, but the number of negative examples should be much larger than one.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/03/2023

Effective and Efficient Training for Sequential Recommendation Using Cumulative Cross-Entropy Loss

Increasing research interests focus on sequential recommender systems, a...
research
08/14/2023

gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling

A large catalogue size is one of the central challenges in training reco...
research
06/14/2021

Unified Interpretation of Softmax Cross-Entropy and Negative Sampling: With Case Study for Knowledge Graph Embedding

In knowledge graph embedding, the theoretical relationship between the s...
research
04/30/2018

A Missing Information Loss function for implicit feedback datasets

Latent factor models with implicit feedback typically treat unobserved u...
research
08/19/2019

Recommender Systems Fairness Evaluation via Generalized Cross Entropy

Fairness in recommender systems has been considered with respect to sens...
research
10/28/2021

Cross-Batch Negative Sampling for Training Two-Tower Recommenders

The two-tower architecture has been widely applied for learning item and...
research
07/11/2020

Cascade Network with Guided Loss and Hybrid Attention for Two-view Geometry

In this paper, we are committed to designing a high-performance network ...

Please sign up or login with your details

Forgot password? Click here to reset