Synthetic dataset generation methodology for Recommender Systems using statistical sampling methods, a Multinomial Logit model, and a Fuzzy Inference System

12/29/2022
by   Vitor T. Camacho, et al.
0

It is said that we live in the age of data, and that data is ubiquitous and readily available if one has the tools to harness it. That may well be true, but so is the opposite. It is ever more common to try to start a data science project only to find oneself without quality data. Be it due to just not having collected the needed features, or due to insufficient data, or even legality issues, the list goes on. When this happens, either the project is prematurely abandoned, or similar datasets are searched for and used. However, finding a dataset that answers your needs in terms of features, type of ratings, etc., may not be an easy task, this is particularly the case for recommender systems. In this work, a methodology for the generation of synthetic datasets for recommender systems is presented, thus allowing to overcome the obstacle of not having quality data in sufficient amount readily available. With this methodology, one can generate a synthetic dataset for recommendation composed by numerical/ordinal and nominal features. The dataset is built with Gaussian copulas, Dirichlet and Gaussian distributions, a Multinomial Logit model and a Fuzzy Logic Inference System that generates the ratings according to different user behavioural profiles and perceived item quality.

READ FULL TEXT

page 12

page 27

research
08/27/2020

Microsoft Recommenders: Tools to Accelerate Developing Recommender Systems

The purpose of this work is to highlight the content of the Microsoft Re...
research
07/10/2014

Bandits Warm-up Cold Recommender Systems

We address the cold start problem in recommendation systems assuming no ...
research
12/18/2018

A Fuzzy Community-Based Recommender System Using PageRank

Recommendation systems are widely used by different user service provide...
research
11/14/2022

Talent Recommendation on LinkedIn User Profiles

With the increasing amount of information on the Internet, recommender s...
research
07/18/2018

RARD II: The 2nd Related-Article Recommendation Dataset

The main contribution of this paper is to introduce and describe a new r...
research
08/31/2018

Eigenvalue analogy for confidence estimation in item-based recommender systems

Item-item collaborative filtering (CF) models are a well known and studi...
research
09/02/2019

All You Need is Ratings: A Clustering Approach to Synthetic Rating Datasets Generation

The public availability of collections containing user preferences is of...

Please sign up or login with your details

Forgot password? Click here to reset