Breaking the Softmax Bottleneck for Sequential Recommender Systems with Dropout and Decoupling

10/11/2021
by   Ying-Chen Lin, et al.
0

The Softmax bottleneck was first identified in language modeling as a theoretical limit on the expressivity of Softmax-based models. Being one of the most widely-used methods to output probability, Softmax-based models have found a wide range of applications, including session-based recommender systems (SBRSs). Softmax-based models consist of a Softmax function on top of a final linear layer. The bottleneck has been shown to be caused by rank deficiency in the final linear layer due to its connection with matrix factorization. In this paper, we show that there are more aspects to the Softmax bottleneck in SBRSs. Contrary to common beliefs, overfitting does happen in the final linear layer, while it is often associated with complex networks. Furthermore, we identified that the common technique of sharing item embeddings among session sequences and the candidate pool creates a tight-coupling that also contributes to the bottleneck. We propose a simple yet effective method, Dropout and Decoupling (D D), to alleviate these problems. Our experiments show that our method significantly improves the accuracy of a variety of Softmax-based SBRS algorithms. When compared to other computationally expensive methods, such as MLP and MoS (Mixture of Softmaxes), our method performs on par with and at times even better than those methods, while keeping the same time complexity as Softmax-based models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2017

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

We formulate language modeling as a matrix factorization problem, and sh...
research
02/21/2019

Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

The softmax function on top of a final linear layer is the de facto meth...
research
05/28/2018

Sigsoftmax: Reanalysis of the Softmax Bottleneck

Softmax is an output activation function for modeling categorical probab...
research
06/17/2022

SimA: Simple Softmax-free Attention for Vision Transformers

Recently, vision transformers have become very popular. However, deployi...
research
01/07/2022

On the Effectiveness of Sampled Softmax Loss for Item Recommendation

Learning objectives of recommender models remain largely unexplored. Mos...
research
08/19/2020

Query Twice: Dual Mixture Attention Meta Learning for Video Summarization

Video summarization aims to select representative frames to retain high-...

Please sign up or login with your details

Forgot password? Click here to reset