## 1 Introduction

We live in the era of data explosion and information overload. Managing it would be impossible without the help of intelligent systems that can process and filter huge amounts of information much faster than humans. The need for such systems was already recognized in late 1970s in the Usenet, a distributed discussion platform, founded at Duke University. One of its goals was to help users to maintain numerous posts by grouping them into newsgroups. However, an active research on the topic of information filtering started in 1990s. The general term Recommender Systems (RS) was brought to the academia in the mid-90’s with works of Resnick, Hill, Shardanand and Maes [4] and was preceded by several famous projects: Tapestry, Lotus Notes, GroupLens [12]. A significant boost in RS research started after a famous Netflix prize competition with $1 million award for the winners, announced back in 2006. This has not only attracted a lot of attention from scientists and engineers, but also depicted the great interest from an industry.

Conventional RS deal with two major types of entities which are typically users (e.g. customers, consumers) and items (e.g. products, resources). Users interact with items by viewing or purchasing them, assigning ratings, leaving text reviews, placing likes or dislikes, etc. These interactions, also called events or transactions, create an observation history, typically collected in a form of transaction/event log, that reflects the relations between users and items. Recognizing and *learning* these relations in order to predict new possible interactions is one of the key goals of RS.

As we will see further, the definition of entities is not limited to users and items only. Entities can be practically of any type as long as predicting new interactions between them may bring a valuable knowledge and/or help to make better decisions. In some cases entities can be even of the same type, like in the task of predicting new connections between people in a social network or recommending relevant paper citations for a scientific papers.

Modern recommender models may also have to deal with more than 2 types of entities within a single system. For instance, users may want to assign tags (e.g. keywords) to the items they like. Tags become the third type of entity, that relates to both users and items, as it represents the user motivation and clarifies items relevance (more on that in Section 5.2). Time can be another example of an additional entity, as both user preferences and items relevance may depend on time (see Section 5.3). Taking into account these multiple relations between several entities typically helps to provide more relevant, dynamic and situational recommendations. It also increases complexity of RS models, which in turn brings new challenges and opens the door for new types of algorithms, such as tensor factorization (TF) methods.

The topic of building a production-ready recommender system is very broad and includes not only algorithms but also concerns a lot about business logic, dataflow design, integration with infrastructure, service delivery and user experience. This also may require a specific domain knowledge and always needs a comprehensive evaluation. Speaking about the latter, the most appropriate way of assessing RS quality is an online A/B testing and massive user studies [38, 27, 46], which are typically not available right at hand in academia. In this work we will only touch mathematical and algorithmic aspects which will be accompanied with examples from various application domains.

The rest of the survey is divided into the following parts: Sections 2 and 3 cover general concepts and major challenges in RS field; Section 4 gives a brief introduction to tensor-related concepts, that are essential for understanding how tensors can be used in RS models; Section 5 contains a comprehensive overview of various tensor-based techniques with examples from different domains; Section 6 concludes the review and provides thoughts on possible future directions.

## 2 Recommender systems at a glance

Let us consider without loss of generality the task of product recommendations. The main goal of this task is, given some prior information about users and items (products), try to predict what particular items will be the most relevant to a selected user. The relevance is measured with some relevance score (or utility) function

that is estimated from the user feedbacks. More formally,

(1) |

where is a domain of all users, is a domain of all items. The feedback can be either explicit or implicit, depending on whether it is directly provided by a user (e.g. ratings, likes/dislikes, etc.) or implicitly collected through an observation of his/her actions (e.g. page clicks, product purchases, etc.).

The type of prior information available in RS model defines what class of techniques will be used for building recommendations. If only an observation history of interactions can be accessed, then this is a task for collaborative filtering (CF) approach. If RS model uses intrinsic properties of items as well as profile attributes of users in order to find the best matching (*user, item*) pairs, then this is a content-based (CB) approach.

The complete overview of RS methods and challenges is out of scope of this survey and for a deeper introduction we refer the reader to [27, 10, 84, 75].

### 2.1 Content-based filtering

As already mentioned, the general idea behind the CB approach is to use some prior knowledge about users preferences and items’ properties in order to generate the most relevant recommendations. One of its main advantages is the ability to alleviate the cold start problem (see Section 3.1) as long as all the needed content information is collected. Recommendations can be produced instantly even for those items that were never recommended to any user before.

This approach also has a number of issues, among which are the limited content analysis, over-specialization and high sensitivity to users input [4, 55]. The key drawback from practical viewpoint is the difficulty of gathering descriptive and thorough properties of both items and users. This can be either manually done by humans, i.e. with help of users and/or domain experts, or extracted automatically with data processing tools. The former method is usually very time consuming and requires considerable amount of work before RS can be built up. The latter method is highly dependent on information retrieval (IR) algorithms and is not always accurate or even possible.

### 2.2 Collaborative filtering

In contrast to CB filtering, CF does not require any specific knowledge about users or items and only uses prior observations of users’ collective behavior in order to build new recommendations. The class of CF techniques is generally divided into two categories: memory-based and model-based methods [10, 5].

#### 2.2.1 Memory-based collaborative filtering

A widely used and very popular approach in this category is based on *k Nearest Neighbours*

(kNN) algorithm

[37]. It finds relevance scores for any (*user, item*) pair by calculating contributions from its neighbors. The neighborhood is typically determined by a similarity between either users (user-based approach) or items (item-based approach) [79] in terms of some similarity measure. This is also called a similarity-based approach. In its simplest implementation, the method requires to store in memory all prior information about user-item interactions in order to make predictions.

Performance of the similarity models may be greatly impacted by a selected measure of similarity (or a distance measure). Cosine similarity, Jaccard index, Pearson correlation, Okapi BM25

[67] are a few examples of possible choices. Even though the pure similarity-based models may give a good recommendations quality in some application domains, factorization models (see Section 2.2.2) are better suited for large-scale problems often met in practice, delivering high performance and high quality recommendations [50, 10].#### 2.2.2 Model-based collaborative filtering

In the model-based approach a predictive model is generated from a long enough history of observations and uses collective behavior of the crowd (a “wisdom of crowds”) in order to extract general behavioral patterns. One of the most successful model-based approaches is a matrix factorization (MF). The power of factorization models comes from the ability to embed users and items as vectors in a lower dimensional space of latent (or hidden) features (see Section

4.3). These models represent both users’ preferences and corresponding items’ features in a unified way so that the relevance score of the user-item interaction can be simply measured as an inner product of their vectors in the latent feature space.As it follows from the description, both CF and CB tackle the problem of building relevant recommendations in very different ways and have their own sets of advantages and disadvantages. Many successful RS use *hybrid* approaches, that combine the advantages of both methods within a single model [51, 13].

## 3 Challenges for recommender systems

Building high quality RS is a complex problem, that involves not only a certain level of scientific knowledge but also greatly relies on an experience, passed from an industry and facing the real world implementations. This topic is also very broad and we will briefly discuss only the most common challenges, that are closely related to an initial model design and its algorithmic implementations.

### 3.1 Cold-start

Cold-start is the problem of handling new entities, that concerns with both users and items [27]. When a new user is introduced to the system we usually know little or nothing about the user preferences and thus it makes it difficult or impossible to predict any interesting items for him or her. Similar problem arises when a new item appears in a product catalog. If an item has no content description or it was not rated by any user it will be impossible to build recommendations with this item.

### 3.2 Missing values

Users typically engage with only a small subset of items and considerable amount of possible interactions stays unobserved. Excluding the trivial case of the lack of interest in specific items, there may be some other reasons for not interacting with them. For example, users may be simply unaware of existing alternatives for the items of their choice. Finding out those reasons helps to make better predictions and, of course, is a part of RS task. However, high level of uncertainty may bring an undesirable bias against unobserved data or even prevent RS models from learning representative patterns, resulting in low recommendations quality.

There are several commonly used techniques, that help to alleviate these issues and improve RS quality. In MF case, simple regularization may prevent the undesired biases. Another effective technique is to assign some non-zero weights to the missing data, instead of completely ignoring it [44]. In hybrid models a content information can be used in order to pre-process observations and assign non-zero relevance scores to some of the unobserved interactions (sometimes called as *sparsity smoothing*

). This new data is then fed into standard CF procedure. Data clustering is another effective approach that is typically used to split the problem into a number of subproblems of smaller size with more connected information. Nevertheless, in case of a particular MF method, based on Singular Value Decomposition (SVD)

[32], simply imputing zero relevance scores for an unobserved values may produce better results

[20, 53]. Additional smoothing can be achieved in that case with help of a*kernel trick*[82]. Other missing value imputation techniques based on various data averaging and normalization methods are also possible [27]. As we will see in Section 5, all of these techniques are valid in TF case as well.

### 3.3 Implicit feedback

In many real systems users are not motivated or not technically equipped to provide any information about their actual experience after interacting with an item. Hence, user preferences can only be inferred from an implicit feedback, which may not necessarily reflect the actual user taste or even tell with guarantees whether the user likes an item or dislikes it [44].

### 3.4 Model evaluation

Without a well designed evaluation workflow and an adequate quality measures it is impossible to build a reliable RS model that behaves equally well in both laboratory and production environments.
Moreover, there are many aspects of a model assessment beyond recommendations accuracy, that are related to both user experience and business goals. This can include metrics like *coverage, diversity, novelty, serendipity* (see [81] for explanations) and indicators such as total generated revenue or average revenue per user session.
This is still an open and ongoing research problem as it is not totally clear what are the most relevant and informative offline metrics and how to align them with the real online performance.

As mentioned in the beginning of Section 3, the most reliable evaluation of RS performance is an online testing and user studies. Researchers typically do not have an access to a production systems so a number of offline metrics (mostly borrowed from IR field), became very popular. The most important among them are the relevance metrics: precision, recall, F1-score and the ranking metrics: normalized discounted cumulative gain (NDCG), mean average precision (MAP), mean reciprocal rank (MRR), area under the ROC curve (AUC). These metrics may to some extent simulate a real environment, and in same cases have strong correlation with business metrics (e.g. recall and clickthrough rates (CTR) [42]).

It is also important to emphasize that while there are some real-world systems that target a direct prediction of a relevance score (e.g. rating), in most cases the main goal of RS is to build a good ranked list of items (top- recommendations task). This imposes some constraints on the evaluation techniques and model construction. It might be tempting to use and optimize for error-based metrics like root mean squared error (RMSE) or mean absolute error (MAE) due to their simplicity. However, good performance in terms of RMSE does not guarantee a good performance on generating a ranked list of top- recommendations [27]. In other words, the predicted relevance score may not align well with the perceived quality of recommendations.

### 3.5 Reproducible results

The problem of reproducibility is closely related to recommendations quality evaluation. Careful design of evaluation procedures is critical for fair comparison of various methods. However, independent studies show that in controlled environments it is problematic to get consistent evaluation results even for the same algorithms on fixed datasets but within different platforms [77].

Situation gets even worse, taking into account that many models, that tackle similar problems, use different datasets (sometimes not publicly available), different data pre-processing techniques [23]

or different evaluation metrics. In order to avoid unintended biases, we will focus mostly on the description of the key features of existing methods rather than on a side-by-side comparison of quantitative results.

### 3.6 Real-time recommendations

A high quality RS are expected not only to produce relevant recommendations but also respond instantly to the system updates, such as new (or unrecognized) users, new items or new feedbacks [50]. Satisfying the latter requirement highly depends on the implementation: the predictive algorithms must have low computational complexity for producing new recommendations and take into account the dynamic nature of a real environments. Recomputation of the full RS model in order to include the new entities may take prohibitively long time and the user may never see a recommendation before he or she leaves. This means that RS application should be capable of making incremental updates and also be able to provide instant recommendations at a low computational cost outside of the full model recomputation loop. A number of techniques has been developed to fulfill these requirements for the MF case [30, 103, 11]. As it will be shown in Section 5.2, these ideas can be also applied in the TF case.

### 3.7 Incorporating context information

In the real world scenarios interactions between users and items exhibit a multifaceted nature. User preferences are typically not fixed and may change with respect to a specific situation. For example, buyers may prefer different goods depending on the season of the year or time of the day. A user may prefer to watch different movies when alone or with a company of friends. We will informally call these situational aspects, that shape user behavior, a contextual information or a context for short (see Figure 1). Another examples of context are location, day of week, mood, the type of a user’s electronic device, etc. Essentially, it can be almost anything [8, 24].

Context-aware recommender systems (CARS) can be built with 3 distinct techniques [3]: contextual prefiltering, where a separate model is learned for every context type; contextual postfiltering, where adjustments are performed after a general context-unaware model was built; and contextual modelling, where context becomes an essential part of the training process. The first two techniques may lose information about the interrelations within a context itself. Contextual modelling, in turn, extends the dimensionality of the problem and promotes multirelational aspect into it. Therefore it is likely to provide more accurate results [45]. Following (1), we can formalize it as follows:

(2) |

where denotes one of contextual domains and the overall dimensionality of the model is .

As we will see further, TF models fit perfectly into the concept of CARS. With a very broad definition of context, tensor-based methods turn into a flexible tool, that allows to naturally model very interesting and non-trivial setups, where the concept of context goes beyond a typical connotation.

As a precaution, it should be noted that a nonspecifity of a context may lead to an interpretability problems. Using a general definition of a context, a content information such as user profile attributes (e.g. age, gender) or items properties (e.g. movie genre or product category) can also be regarded as some type of context(see, for example, [45], where age and gender are used to build new context dimensions). However, in practice, especially for TF models, this mixing is typically avoided [98, 74]. One of the possible reasons is a deterministic nature of content information in contrast to what is usually denoted as a context. Similarly to MF techniques, TF reveals new unseen associations (see Section 4.3) which in the case of deterministic attributes may be hard to interpret. It is easy to see in the following example.

For a triplet (*user, movie, gender*) the movie rating may be associated with only one of two possible pairs of (*user, gender*), depending on the actual user’s gender. However, once a reconstruction (with help of some TF technique) is made, a non-zero value of rating may now pop-up for both values of gender. The interpretation of such an association may become tricky and highly depends on initial problem formulation.

## 4 Introduction to tensors

In this section we will only briefly introduce some general concepts needed for better understanding of further material. For a deeper introduction to the key mathematical aspects of multilinear algebra and tensor factorizations we refer the reader to [48, 19, 35]. As in the case of MF in RS, TF produces a predictive model by revealing patterns from the data. The major advantage of a tensor-based approach is the ability to take into account a multifaceted nature of user-item interactions.

### 4.1 Definitions and notations

We will regard an array of numbers with more than 2 dimensions as a *tensor*. This is a natural extension of matrices to a higher order case. A tensor with distinct dimensions or *modes* is called an -way tensor or a tensor of order .

Without loss of generality and for the sake of simplicity we will start our considerations with a 3rd order tensors to illustrate some important concepts. We will denote tensors with calligraphic capital letters, e.g. stands for a 3rd order tensor of real numbers with dimensions of sizes . We will also use a compact form , where is an element or entry at position , and will assume everywhere in the text the values of the tensor to be real.

##### Tensor fibers.

A generalization of matrix rows and columns to a higher order case is called a *fiber*. Fiber represents a sequence of elements along a fixed mode when all but one indices are fixed. Thus, a mode-1 fiber of a tensor is equivalent to a matrix column, a mode-2 fiber of a tensor corresponds to a matrix row. A mode-3 fiber in a tensor is also called a tube.

##### Tensor slices.

Another important concept is a tensor *slice*. Slices can be obtained by fixing all but two indices in a tensor, thus forming a two-dimensional array, i.e. matrix. In a third order tensor there could be 3 types of slices: horizontal, lateral, and frontal, which are denoted as respectively.

##### Matricization.

Matricization is a key term in tensor factorization techniques. This is a procedure of reshaping a tensor into a matrix. Sometimes it is also called unfolding or flattening. We will follow the definition introduced in [48]. The -mode matricization of a tensor arranges the mode- fibers to be the columns of the resulting matrix (see Figure 2). For the 1-mode matricization the resulting matrix size is , for the 2-mode matricization the size is and the 3-mode matricization has the size . In the general case of an -th order tensor the -mode matricization will have the size . For the corresponding index mapping rules we refer the reader to [48].

##### Diagonal tensors.

Another helpful concept is a diagonal tensor. A tensor is called diagonal if only if . This concept helps to build a connection between different kinds of tensor decompositions.

### 4.2 Tensor Factorization techniques

The concept of TF can be better understood via an analogy with MF. For this reason we will first introduce a convenient notation and representation for the MF case and then generalize it to a higher order.

### 4.3 Dimensionality reduction as a learning task

Let us first start with SVD, as it helps to illustrate some important concepts and also serves as a workhorse for certain TF techniques. Any matrix can be represented in the form:

(3) |

where and are orthogonal matrices, is a diagonal matrix of non-negative singular values and
is a *rank* of SVD. According to the Eckart-Young theorem [26], the truncated SVD of rank with set to 0 gives the best rank- (also called low-rank) approximation of matrix . This has a number of important implications for RS models.

A typical user-item matrix in RS represents a snapshot of real “noisy” data and it is practically never of low-rank. However, the collective behavior has some common patterns which yield a low-rank structure and the real data can be modelled as:

where is a “noise” and is a rank- approximation of data matrix . The task of building a recommendation model translates into the task of recovering (or equivalently, minimizing the noise ). For illustration purposes here we assume that missing data in is replaced with zeroes (other ways of dealing with missing values problem are briefly described in Section 3.2). Despite the simplicity of the assumption this approach is known to serve as a strong baseline [20, 53]. The Eckart-Young theorem states, that an optimal solution to the resulting optimization task

(4) |

is given by the truncated SVD:

(5) |

Here and further in the text we use to denote Frobenius norm (both for matrix and tensor case), if not specified otherwise.

In terms of RS, factor matrices and , learned from observations, represent an embedding of users and items into the reduced latent space with latent features. The dimensionality reduction produces a “denoised picture” of data, it reveals a hidden (latent) structure that describes the relations between users and items. With this latent representation some previously unobserved interactions can be uncovered and used to generate recommendations. This idea can be extended to a case of higher order relations between more than 2 entities and that is where a tensor factorizations techniques come into play.

From now on we will omit the subscript in the equations for both matrix and tensor factorizations, e.g. we will denote a factor matrix simply as and likewise for other factor matrices. Let us also rewrite (3) in the form, that is useful for further generalization to higher order case:

(6) |

where is an *n-mode product* which is typically defined for product between high order tensor and matrix.
In matrix case the -mode product between two matrices and has the following form (assuming they are conformable):

In a more general case each resulting element of an -mode product between tensor and matrix is calculated as follows [48]:

(7) |

For the same purpose of further generalization we will rewrite (6) in 2 other forms, the index form:

and a sum of rank-1 terms:

(8) |

where denote columns of the factor matrices, e.g. and denotes the vector outer product (or dyadic product).

In the tensor case we will also be interested in the task of learning a factor model from a real observations data . This turns into a dimensionality reduction problem that gives a suitable (not necessarily the best in terms of error-based metrics) approximation:

where is calculated with help of some of the tensor decomposition methods, described further. We will keep this notation throughout the text, e.g. will always be used to denote a real data and will always be used to represent the reconstructed model, learned from .

#### 4.3.1 Candecomp/Parafac

The most straightforward way of extending SVD to higher orders is to add new factors in (8). In the third order case this will have the following form:

(9) |

where each summation component is a *rank-1* tensor. We can also equivalently rewrite (9) in a more concise notation:

(10) |

where is a vector of length with elements and , , defined similarly to (8). The expression assumes that factors are normalized. As we will see further, in some cases values of can have a meaningful interpretation. However, in general, the assumption can be safely omitted, which yields:

(11) |

or in the index from:

(12) |

The right-hand side of (11) gives an approximation of real observations data and is called Candecomp/Parafac (CP) decomposition of a tensor . Despite being similar to (8) formulation, there is a number of substantial differences in the concepts of tensor rank and low-rank approximation, thoroughly explained in [48]. Apart from technical considerations, an important conceptual difference is that there is no higher order extension of the Eckart-Young theorem (mentioned in the beginning of Section 4.3), i.e. if an exact low-rank decomposition of with rank is known, then its truncation to the first terms may not give the best rank- approximation. Moreover, the optimization task in terms of low-rank approximation is ill-posed [22] which is likely to lead to numerical instabilities and issues with convergence, unless additional constraints on factor matrices (e.g. orthogonality, non-negativity, etc.) are imposed.

#### 4.3.2 Tucker decomposition

A stable way of extending SVD to a higher order case is to transform the diagonal matrix from (6) into a third order tensor and add an additional mode-3 tensor product (defined by (7)) with a new factor matrix :

(13) |

where are orthogonal matrices, having similar meaning of the latent feature matrices as in the case of SVD. Tensor is called a core tensor of the TD and a tuple of numbers () is called a multilinear rank. The decomposition is not unique, however the optimization problem with respect to multilinear rank is well-posed. Note, that if tensor is diagonal with all ones on its diagonal, than the decomposition turns into CP. In the index notation TD takes the following form:

(14) |

The definition of TD is not restricted to have 3 modes only. Generally, the number of modes is not limited, however storage requirements depend exponentially on the number of dimensions (see Table 1), which is often referred as a *curse of dimensionality*. This imposes strict limitations on the number of modes for many practical cases, whenever more than 4 entities are modelled in a multilinear way (e.g. *user*, *item*, *time*, *location*, *company*

or any other context variables, see Figure 1). In order to break the curse of dimensionality, a number of efficient methods has been developed recently, namely Tensor Train (TT)

[64] and Hierarchical Tucker (HT) [34]. However, we are not aware of any published results related to TT- or HT-based implementations in RS.CP | TD | TT | HT | |
---|---|---|---|---|

storage |

#### 4.3.3 Optimization algorithms

Let us start from the simplest form of an optimization task where the objective

is defined by a loss function

as follows:(15) |

where denotes model parameters, i.e. for CP-based models and in case of TD. The optimization criteria takes the following form:

(16) |

where defines an optimal set of model parameters which are to be used to generate recommendations.

It is convenient to distinguish the three general categories of optimization objectives that lead to different ranking mechanisms: *pointwise*, *pairwise* and *listwise* [14].
Pointwise objective depends on a pointwise loss function
between the observations and the predicted values . In case of a square loss the total loss function has a similar to (4) form:

(17) |

Pairwise objective depends on a pairwise comparison of the predicted values and penalizes those cases when their ordering does not correspond to the ordering of observations. The total loss can take the following form in that case:

where is a pairwise loss function that decreases with the increase of .

Listwise objective operates over whole sets (or lists) of predictions and observations. The listwise loss function, that can be schematically expressed as , penalizes the difference between the predicted ranking of a given list of items and the ground truth ranking, known from observations.

##### Pointwise algorithms for TD.

In case of TD-based model the solution to (16) given (17) can be found with help of two well-known methods proposed in [21]: Higher-Order SVD (HOSVD) [86, 90, 68] or Higher-Order Orthogonal Iteration (HOOI) [105, 88].

The HOSVD method can be described as a consecutive application of SVD to all 3 matricizations of , i.e. (assuming that missing data is imputed with zeros). Generally it produces a suboptimal solution to an optimization problem induced by (15), however, it is worse than the best possible solution only by a factor of , where is the number of dimensions [36]. Due to its simplicity this method is often used in recommender systems literature.

The HOOI method uses an iterative procedure based on an alternating least squares (ALS) technique, which successively optimizes (15). In practice it may require a small amount of iterations to converge to an optimal solution, but in general it is not guaranteed to find a global optimum [48]. The choice of any of these two methods for particular problem may require additional investigation in terms of both computational efficiency and recommendations quality before the final decision is made.

The orthogonality constraints imposed by TD may in some cases have no specific interpretation. Relaxing these constraints leads to a different optimization scheme, typically based on gradient methods, such as stochastic gradient descent (SGD)

[45]. The objective in that case is expanded with a regularization term :(18) |

which is commonly expressed as follows:

(19) |

where , are regularization parameters and usually .

##### Pointwise algorithms for CP.

As has been noted in Section (4.3.1), CP is generally ill-posed and if no specific domain knowledge could be employed to impose additional constraints, a common approach to alleviate the problem is to introduce regularization similarly to (19):

(20) |

Indeed, depending on the problem formulation it may also have more complex form both for CP (e.g. as in Section 5.3.1) and TD models. In general, regularization allows to ensure convergence and avoid degeneracy (e.g. when rank-1 terms become close to each other by absolute value but their magnitudes go to infinity and have opposite signs [48]), however it may lead to a sluggish rate of convergence [61]. In practice, however, many problems can still be solved with CP using variations of both ALS [42, 47] and gradient-based methods.

##### Pairwise and listwise algorithms.

Pairwise and listwise methods are considered to be more advanced and accurate as they are specifically designed to solve ranking problems. The objective function is often derived directly from a definition of some ranking measure, e.g. pairwise AUC or listwise MAP (see [72] and [83] for CP-based and TD-based implementations respectively), or constructed in a way that is closely related to those measures [73, 74].

These methods typically have a non-trivial loss function with complex data interconections within it which makes it hard to optimize and tune. In practice, the complexity problem is often resolved with help of handcrafted heuristics and problem-specific constraints (see Sections

5.2.2 and 5.4.2), which simplify the model and improve computational performance.## 5 Tensor-based models in recommender systems

Treating data as tensor may bring new levels of flexibility and/or quality into RS models, however there are nuances that should be taken into account and treated properly. This section covers different tensorization techniques used to build advanced RS in various application domains. For all the examples we will use a unified notation (where it is possible) introduced in Section 4, hence it might look different from the notation used in the original papers. This helps to reuse some concepts within different models and build a consistent narrative throughout the text.

### 5.1 Personalized search and resource recommendations

There is a very tight connection between personalized search and RS. Essentially, recommendations can be considered as a *zero query search* [6] and, in turn, personalized search engine can be regarded as a query-based RS.

Personalized search systems aim at providing a better search experience by returning the most relevant results, typically web pages (or resources), in response to a user’s request. A clicktrough data (i.e. an event log of clicks on the search results after submitting a search query) can be used for this purpose as it contains an information about users’ actions and may provide valuable insights into search patterns. The essential part of this data is not just a web page that a user clicks on, but also a context, a query associated with every search request that carries a justification for the user’s choice. The utility function in that case can be formulated as:

where *Resource* denotes a set of web pages and *Query*

is a set of keywords that can be specified by users in order to emphasize their current interests or information needs. In the simplest case a single query can consist of one or a few words (e.g. “jaguar” or “big cat”). More elaborate models could employ additional natural language processing tools in order to breakdown queries into a set of single keywords, e.g. a simple phrase “what are the colors of the rainbow” could be transformed into a set {“rainbow”, “color”} and further split into 2 separate queries, associated with the same (

*user, resource*) pair.

#### 5.1.1 CubeSVD

One of the earliest and at the same time very illustrative works where this formulation was explored with help of tensor factorization is CubeSVD [86]. The authors build a 3-rd order tensor . Values of the tensor represent the level of association (the relevance score) between the user and the web-page in presence of the query :

where is an observation history, e.g. a sequence of events described by the triplets (*user, resource, query*). Note that authors in their work use simple queries without processing, e.g. “big cat” is a single query term.

The association level can be expressed in various ways, the simplest one is to measure a co-occurrence frequency , e.g. how many times a user has clicked on a specific page after submitting a certain query. In order to prevent an unfair bias towards the pages with high click rates, it can be restricted to have only values of 0 (no interactions) or 1 (at least one interaction). Or it can be rescaled with a logarithmic function:

where is a new normalized frequency and is, for example, an IDF (Inverse Document Frequency) measure of a web page. Another scaling approach can also be used.

The authors proposed to model the data with a third order TD (13) and in order to find it they applied the HOSVD. Similarly to SVD (3),
factors and represent embedding of users, web pages and queries vectors into a lower-dimensional latent factors space with dimensionalities and correspondingly.
The core tensor defines the form and the strength of multilinear relations between all three entities in the latent feature space. Once the decomposition is found, the relevance score for any (*user, resource, query*) triplet can be recovered with (14).

With the introduction of new dimensions the data sparsity becomes even higher, which may lead to a numerical instabilities and general failure of the learning algorithm. In order to mitigate that problem, the authors propose several smoothing techniques: based on value imputation with small constant and based on the content similarity of web pages. They reported an improvement in the overall quality of the model after these modifications.

After applying the decomposition technique the reconstructed tensor will contain new non-zero values denoting potential associations between users and web resources influenced by certain queries. The tensor values can be directly used to rank a list of the most relevant resources: the higher the value is the higher the relevance of the page to the user within the query .

This simple TF model does not contain a remedy for some of the typical RS problems such as cold start or real-time recommendations and is most likely to have issues with scalability. Nevertheless, this work is very illustrative and demonstrates the general concepts for building a tensor-based RS.

#### 5.1.2 Tophits

As has been discussed in Section 3.6, new entities can appear in the system dynamically and rapidly, which in the case of higher order models creates even more computational load, i.e. full recomputation of tensor decomposition quickly becomes infeasible and incremental techniques should be used instead. However, in some cases simply redefining the model might lower the complexity. As we mentioned in Section 2.2.1, a simple approach to reduce the model is to eliminate one of the entities with some sort of aggregation.

For example, instead of considering (*user, resource, query*) triplets we could work with aggregated (*resource, resource, query*) triplets, where every frontal slice of the tensor is simply an adjacency matrix of a resources browsed together under a specific query. Therefore users are no longer explicitly stored and their actions are recorded only in the form of a co-occurrence of resources they searched for.

An example of such a technique is TOPHITS model [49, 47]. This analogy requires an extra explanation as the authors are not modelling users clicking behavior. The model is designed for web-link analysis of a static set of web pages referencing each other via hyperlinks. The data is collected by crawling those web pages and collecting not only links but also keywords associated with them. However, the crawler can be interpreted as a set of users browsing those sites by clicking on the hyperlinked keywords. This draws the connection between CubeSVD and TOPHITS model as the keywords can be interpreted as a short search queries in that case. And, as we stated earlier, users (or crawlers) can be eliminated from the model by constructing an adjacency matrix of linked resources.

The authors of TOPHITS model extend an adjacency matrix of interlinked web pages with the collected keyword information and build a so called adjacency tensor , that encodes hubs, authorities and keywords. As has been mentioned, the keyword information is conceptually very similar to queries, hence it can be also modelled in a multirelational way. Instead of TD format the authors prefer to use CP in the form of (10) with and with ALS-based optimization.

The interpretation of this decomposition is different from the CubeSVD. As the authors demonstrate, the weights , have a straightforward semantic meaning as they correspond to a set of specific topics extracted from the overall web page collection. Accordingly, every triplet of vectors () represents a collection of hubs, authorities and keyword terms respectively, characterized by a topic . The elements with higher values in these vectors provide the best-matching candidates under the selected topic, which allows a better grouping of web pages within every topic and provide means for a personalization.

For example, as the authors show, a personalized ranked list of authorities can be obtained with:

(21) |

where is a diagonal matrix and is a user-defined query vector of length with elements if term belongs to the query and 0 otherwise, }. Similarly, a personalized list of hubs can be built simply by substituting factor with in (21).

The interpretation of tensor values might seem very natural, however there is an important note to keep in mind. Generally, the restored tensor values might turn both positive and negative. And in most applications the negative values have no meaningful explanation. The non-negative tensor factorization (NTF) [18, 107] can be employed to resolve that issue (see example in [16], and also the connection of NTF to probabilistic factorization model under a specific conditions [17]).

### 5.2 Social tagging

A remarkable amount of research is devoted to one specific domain, namely social tagging systems (STS), where predictions and recommendations are based on commonalities in social tagging behavior (also referred as collaborative tagging). A comprehensive overview of the general challenges and state-of-the-art RS methods can be found in [58].

Tags carry a complementary semantic information, that helps STS users to categorize and organize items of their choice. This brings an additional level of interpretation of the user-item interactions, as it exposes the motives behind the user preferences and explains the relevance of particular items.
This observation suggests that tags play an important role in defining the relevance of (*user, item*) pairs, hence all the three entities should be modelled mutually in a multirelational way. The scoring function in that case can be defined as follows:

The triplets (*user, item, tag*), coming from an observation history , can be easily translated into a 3rd order tensor , where denote the number of users, items and tags respectively. Users are typically not allowed to assign the same tags to the same items more than once, hence the tensor values are strictly binary and defined as:

#### 5.2.1 Unified framework

As in the case of keywords and queries, tensor dimensionality reduction helps to uncover latent semantic structure of the ternary relations. The values of the reconstructed tensor can be interpreted as the likeliness or weight of new links between users, items and tags. These links might be used for building recommendations in various ways: help users assign relevant tags for items [93], find interesting new items [92], or even find like-minded users [87].

The model that is built on top of all three possibilities is described in [90]. The authors perform a latent semantic analysis on the data with help of the HOSVD. Generally, the base model is similar to CubeSVD (see Section 5.1): items can be treated as resources and tags as queries.

The authors also face the same problem with sparsity. The tensor matricizations within the HOSVD procedure produce highly sparse matrices which may prevent the algorithm from learning the accurate model. In order to overcome that problem they propose a smoothing technique based on a *kernel trick*.

In order to deal with the problem of real-time recommendations (see Section 3.6) the authors adopt a well known *folding-in method* [30] to a higher order case. The folding-in procedure helps to quickly embed a previously unseen entity into the latent features space without recomputing the whole model. For example, an update to a users feature matrix can be obtained with:

where is a new user information that corresponds to a *row* in the matricitized tensor ; is an already computed (during HOSVD step) matrix of right singular vectors of , and is a corresponding diagonal matrix of singular values; is an *update row* which is appended to the latent factor matrix . The resulting update to reconstructed tensor is computed with (see Figure 3):

where the term within the left brackets of the right hand side does not contain any new values, e.g. does not require the full recomputation and can be pre-stored, which makes the update procedure much more efficient.

Nevertheless, this typically leads to a loss of orthogonality in factors and negatively impacts the accuracy of the model in the long run. This can be avoided with an *incremental SVD update*, which for the matrices with missing entries was initially proposed by [11]. As the authors demonstrate, it can be also adopted for tensors.

It should be noted, that this is not the only possible option for incremental updates. For example, a different incremental TD-based model with HOOI-based optimization is proposed in [105] for a highly dynamic, evolving environment (not related to tag-based recommendations). The authors of this work use an extension of a two-dimensional incremental approach from [76].

#### 5.2.2 RTF and PITF

The models, overviewed so far, has a common “1/0” interpretation scheme for a missing values, i.e. all triplets are assumed to be positive feedback and all others (missing) are negative feedback with zero relevance score. However, as the authors of ranking with TF model (RTF) [72] and more elaborate pairwise interaction TF (PITF) [73] model emphasize, all missing entries can be split into 2 groups: the true negative feedback and the unknown values. The true negatives
correspond to those triplets of where the user has interacted with the item and has assigned tags different from the tag . More formally, if is a set of all posts that correspond to all observed (*user, item*) interactions, than true negative feedback within any interaction is defined as:

trivially, true positive feedback is:

All other entries are unknowns and are to be uncovered by the model.

Furthermore, both RTF and PITF models do not require any specific values to be imposed on either known or unknown entries. Instead they only impose pairwise ranking constraints on the reconstructed tensor values:

These post-based ranking constraints become the essential part of an optimization procedure. The RTF model uses the Tucker format, however it aims at directly maximizing AUC measure, which, according to the authors, takes the following form:

where are the parameters of the TD model (as defined in (18)) and

is a sigmoid function, introduced to make the term differentiable:

(22) |

As we are interested in maximizing AUC and due to (16), the loss function takes the form:

The regularization term of the model is defined by (20).

The authors adopt a stochastic gradient descent (SGD) algorithm for solving the optimization task. However, as they state, directly optimizing the AUC objective is computationally infeasible. Instead, they exploit a smart trick of recombining and reusing precomputed summation terms within the objective and its derivatives. They use this trick for both tasks of learning and building recommendations.

The PITF model is built on top of ideas from RTF model. It adopts Bayesian Personalized Ranking (BPR) technique proposed for MF case in [70] to the ranking approach. The tags rankings for every observed post are not deterministically learned like in RTF model but instead are derived from the observations by optimizing the maximum aposteriori estimation. This leads to a similar to RTF optimization objective with similar regularization (excluding the tensor core term which is not present in CP) and slightly different loss function:

where the same notation as in RTF is used; is a sigmoid function from (22) and is a training data, i.e. a set of quadruples:

(23) |

An important difference of PITF from RTF is that the complexity of multilinear relations is significantly reduced by leaving only pairwise interactions between all entities. From the mathematical viewpoint it can be considered as a CP model with a special form of partially fixed factor matrices (cf. (12)):

(24) |

where and are the parts of the same matrix responsible for tags relation to users and items respectively; and are interactional parts of and .

The authors emphasize, that the user-item interaction term does not contribute to the BPR-based ranking optimization scheme which yields even more simple equation, that becomes an essential part of the PITF model:

(25) |

Another computational trick that helps to train the model even faster without sacrificing the quality is random sampling within the SGD routine. All the quadruples in corresponding to a post are highly overlapped with respect to the tags associated with them. Therefore, learning with some randomly sampled quadruples is likely to have a positive effect on learning the remaining ones.

In order to verify the correctness and effectiveness of such simplifications the authors conduct experiments with both BPR-tuned TD and CP and demonstrate that PITF algorithm achieves close or even better quality of recommendations while learning features faster than the other two TF methods.

Despite its computational effectiveness, the original PITF model is lacking the support for the real-time recommendation scenarios, where rebuilding the full model for each new user, item or tag could be prohibitive. The authors of [54] overcome this limitation by introducing the folding-in procedure compatible with the PITF model and demonstrate its ability to provide high recommendations quality. Worth noting here, that a number of variations of the folding-in technique are available for different TF methods, see [104].

The idea of modelling higher order relations in a joint pairwise manner similar to (25) has been explored in various application domains and is implemented in various settings, either straightforwardly or as a part of a more elaborate RS model [31, 39, 106, 80]. There are several generalized models [97, 74], [42], that also use this idea. They are covered in more details in Sections 5.4.3 and 5.4.5 of this work.

#### 5.2.3 Improving the prediction quality

As has been already mentioned in Section 5.2.1 high data sparsity typically leads to a less accurate predictive models. This problem is common across various RS domains. Another problem, specific to STS, is tag ambiguity and redundancy. The following are the examples of some of the most common techniques, developed to deal with these problems.

The authors of CubeRec [100] propose a clustering-based separation mechanism. This mechanism builds clusters of triplets (*user, item, tag*) based on the proximity of tags derived from the item-tag matrix. With this clustering some of the items and tags can belong to several clusters at the same time, according to their meaning. After that the initial problem is split into a number of sub-problems (corresponding to clusters) of a smaller size and hence, with a more dense data. Every subproblem is then factorized with the HOSVD similarly to [86], and the resulting model is constructed as a combination of all the smaller TF models.

The authors of the clustering-based TD model (ClustHOSVD) [88] also employ clustering approach. However, instead of splitting the problem, they replace tags by tag clusters and apply the HOOI method (which is named AlsHOSVD by the authors) directly to the modified data consisting of (*user, item, tag cluster*) triplets. They also demonstrate the effect of different clustering techniques on the quality of RS.

As can be seen, many models benefit from clustering either prior to or after the factorization step. This suggests that it can also be beneficial to perform simultaneous clustering and factorization. This idea is explored by the authors of [29], where they demonstrate the effectiveness of such an approach.

A further improvement can be achieved with hybrid models (see Section 2.2.2), that exploit a content information and incorporate it into a tensor-based CF model. It should be noted, however, that there is no “single-bullet” approach, suitable for all kinds of problems, as it highly depends on the type of data used as a source of content information.

The authors of [60]

exploit acoustic features for music recommendations in a tag-based environment. The features, extracted with specific audio-processing techniques, are used to measure the similarity between different music samples. The authors make an assumption that similarly sounding music is likely to have similar tags, which allows to propagate tags to the music that was not tagged yet. With this assumption the data is augmented with new triplets of (

*user, item, tag*), which leads to a more dense data and results in a better predictive quality of the HOSVD model.

The TF and tag clustering (TFC) model [68] combines both content exploitation and tag clustering techniques. The authors focus on the image recommendations problem, thus they use an image processing techniques in order to find items’ similarities and propagate highly relevant tags. Once the tag propagation is completed, the authors find tag clusters (naming them topics) and build new association triplets (*user, item, topic*), which are further factorized with the HOSVD.

As a last remark in this section, the idea of model splitting, proposed in the CubeRec model, was also explored in a more general setup in [94]. The authors consider a multiple context environment, where user-item interactions may depend on various contexts such as location, time, activity, etc. This is generally modelled with an -th order tensor, where . Instead of dealing with higher number of dimensions and greater sparsity, the authors propose to build a separate model for every context type, which transforms the initial problem into a collection of a smaller problems of order 3. Then all the resulting TF models are combined with specific weights (based on the context influence measure proposed by the authors) and can be used to produce recommendations. However, despite the ability to better handle the sparsity issue, the model may loose some valuable information about the relations between different types of context. A more general methods for multi-context problems are covered in Section 5.4.

### 5.3 Temporal models

User consumption patterns may change in time. For example, the interest of TV users may correlate not only with a topic of a TV program, but also with a specific time of the day. In retail user preferences may vary depending on the season. Temporal models are designed to learn those time-evolving patterns in data by taking the time aspect into account, which can be formalized with the following way scoring function:

Even though the general problem statement looks already familiar, when working with the domain one should mind the difference between the evolving and periodic (e.g. seasonal) events which may require a special treatment.

#### 5.3.1 Bptf

One of the models that exploits periodicity of events is the Bayesian Probabilistic TF (BPTF) [99]. It uses seasonality to reveal trends in retail data and predict the orders that will arrive in the ongoing season based on the season’s start and previous purchasing history. The key feature of the model is the ability to produce forecasts on the sales of the new products, that were not present in previous seasons. The model captures dynamic changes in both product designs and customers’ preferences solely from the history of transactions and does not require any kind of an expert knowledge.

The authors develop a probabilistic latent factors model by introducing priors on the parameters; i.e. the latent feature vectors are allowed to vary and the variance of relevance scores is assumed to follow a Gaussian distribution:

where is an observations precision and denotes a right hand side of (12). Note, that in the original work the authors use a transposed version of the factor matrices, e.g. any column of the factor in their work represents a single user, the same holds for two other factors.

In order to prevent the overfitting the authors also impose prior distributions on and :

Furthermore, the formulation for the time factor takes into account its evolving nature and implies smooth changes in time:

The time factor rescales the user-item relevance score with respect to the time-evolving trends and the probabilistic formulation helps to account for the users who do not follow those trends.

The authors show that maximizing the log-posterior distribution with respect to and is equivalent to an optimization task with the weighted square loss function:

(26) |

and a bit more complex regularization term:

where and the last two terms are due to a dynamic problem formulation. The number of parameters of this model makes the task of optimization almost infeasible. However, the authors come up with an elaborate MCMC-based integration approach, that makes the model almost parameter-free and also scales well.

#### 5.3.2 Tcc

The authors of TF-based subspace clustering and preferences consolidation model (TCC) [96] exploit the periodicity in usage patterns of the IPTV users in order to, at first, identify them and, secondly, provide with more relevant recommendations, even if those users share the same IPTV account (for example, across all family members). This gives a slightly different definition of a utility function:

where is the domain of all registered accounts and the number of accounts is not greater than the number of users, i.e. .
Initial tensor is built from the triplets (*account, item, time*) and its values are just the play counts.

In order to be able to find a correct mapping of the real users to the known accounts, the authors introduce a concept of a *virtual user*.
Within the model the real user is assumed to be a composition of particular virtual users which express the specific user’s preferences tied to a certain time periods, e.g.:

where is an account from the set of all accounts , is a sub-period from the set of all non-overlapping time periods .

As the authors state, manually splitting the time data into the time slots does not fit the data well and they propose to find those sub-periods from the model. They first solve the SGD-based optimization task (18), (16) for the TD with the same weighted squared loss function as in (26) and regularization term as in (19) (with ). Once the model factors are found, the sub-periods can be obtained by clustering the time feature vectors:

Then the consolidation of virtual users into the real ones can be done in 2 steps. At first, a binary similarity measure is computed between different pairs of virtual users (, ) corresponding to the same account . The second step is to combine similar virtual users so that every real user is represented as a set of virtual ones. This is done with help of a graph-based techniques. Once the real users are identified, recommendations can be produced with a user-based kNN approach. As the authors demonstrate, the proposed method not only provides a tool for user identification, but also outperforms standard kNN and TF-based methods applied without any prior clustering.

### 5.4 General context-aware models

In previous sections we have discussed TF methods targeted at specific classes of problems: keyword- or tag-based recommendations, temporal models. They all have one thing in common - the use of a third entity leading to a higher level of granularity and better predictive capabilities of a model. This leads to an idea of generalization of such an approach, that is suitable for any model formulated in the form of (2).

#### 5.4.1 Multiverse

One of the first attempts towards this generalization is the Multiverse model [45]. The authors define context as any set of variables that influence users’ preferences and propose to model it by the -th order TD with contextual dimensions:

where factors represent a corresponding embedding of every contextual variable into a reduced latent space and all factors including and are not restricted to be orthogonal. As the authors state, the model is suitable for any contextual variables over a finite categorical domain. It should be noted, that the main focus of the work is systems with an explicit feedback and the model is optimized for the error-based metrics, which does not guarantee an optimal items ranking as has been discussed in Section 3.4.

Following the general form of an optimization objective stated in (18), the authors use the weighted loss function:

where is a pointwise loss function, that can be based on , or other types of distance measure. The example is provided for the 3rd order case, however, it can be easily generalized to a higher orders. The authors also use the same form of the regularization term as in (19), as it enables trivial optimization procedure.

In order to fight against the growing complexity for the higher order cases they propose a modification of the SGD algorithm. Within a single optimization step the updates are performed on every row of the latent factors independently. For example, an update for -th row of :

is independent on all other factors and thus all the updates can be performed in parallel. The parameter defines the model’s learning step.

In addition to the general results on the real dataset, this work features a comprehensive experimentation on the semi-synthetic data that shows the impact of a contextual information on the RS models performance. It demonstrates that high context influence leads to better quality of the selected context-aware methods, among which the proposed TF approach gives the best results, while a context-unaware method’s quality significantly degrades.

#### 5.4.2 Tfmap

Similarly to the previously discussed PITF model, the TF for MAP optimization model (TFMAP) [83] also targets optimal ranking, however it exploits the MAP metric instead of AUC. The model is designed for an implicit feedback systems which means that the original tensor is binary with non-zero elements reflecting the fact that the interaction has occurred:

(27) |

The optimization objective is drawn from the MAP definition:

where denotes the rank of the item in the items list of the user under the context type and is an indicator function, which is equal to 1 if the condition is satisfied and 0 otherwise, both depend on the reconstructed values of . In order to make the metric smooth and differentiable the authors propose two approximations:

where is calculated with (12) (which makes the model a CP-based) and is a sigmoid function defined by (22). Notably, , where we use the same notation as in BPTF model, see Section 5.3.1.

The model also follows the standard optimization formulation stated in (18), where the loss function is just a negative MAP gain, i.e. , and the regularization has the form of (20).

Note, that MAP optimization also has a weighted form due to (27), however, the computation complexity would still be prohibitively high due to its complex structure.
In order to mitigate that, the authors propose the fast learning algorithm: for each (*user, context*) pair only a limited set of a representative items (a buffer) is considered, which in turn, allows to control the computational complexity. They also provide an efficient algorithm of sampling the “right” items and constructing the buffer, which does not harm the overall quality of the model.

#### 5.4.3 Cartd

The CARTD model (Context-Aware Recommendation Tensor Decomposition) [97, 74] provides a generalized framework for an arbitrary number of contexts and also targets an optimal ranking instead of a rating prediction. Under the hood the model extends the BPR-based ranking approach used in the PITF model to the higher order cases.

The authors introduce a unified notion of an entity. A formal task is to find the list of the most relevant entities within a given contextual situation. Remarkably, all the information, that is used to make recommendations more accurate and relevant, is defined as a context. In that sense, not only information like tag, time, location, user attributes, etc. is considered to be a context, even users themselves might be defined as a context of an item. This gives a more universal formulation for the recommendations task:

(28) |

As an illustration to that, a quadruple (*user, item, time, location*) maps to (*context, entity, context, context*). Obviously, the definition of the entity depends on the task. For example, in case of social interactions prediction with (*user, user, attribute*) triplets, the main entity as well as one of the context variables will be a user.

The observation data in a typical case of a user-item interactions can be encoded similarly to (23):

where and are the entities (i.e. items) and denotes a context type (includes users). As the authors emphasize, this leads to a huge sparsity problem, and instead they propose to relax conditions and instead build the following set for learning the model:

where indicates the number of occurrences of an entity within the context . The rule denotes the prevalence of the entity over the entity with respect to all possible contexts.

The optimization objective will also look similar to the one used in the PITF model with the loss function defined as:

where denotes a set of all context variables and the tensor values are calculated with help of the reduced CP model with the pairwise only interactions, similarly to (25):

where and are the elements at the cross section of the -th row and the -th column of the factor matrices and respectively. As in the previous cases, the regularization term have similar to (20) form, which includes all the factors from :

#### 5.4.4 iTALS

As has been mentioned in the introduction (see Section 3.3), an implicit feedback does not always correlate with the actual user preferences, thus a simple binary scheme (as in (27)) may not be accurate enough. For this reason, the authors of the iTALS model (ALS-based implicit feedback recommender algorithm) [40] propose to use the confidence-based interpretation of an implicit feedback introduced in [44] and adopt it for the higher order case.

They introduce the dense tensor that assigns non-zero weights for both observed and unobserved interactions. For the -th order tensor it has the following form:

(29) |

where is the number of occurrences of the tuple (e.g. the combination of the user and the item interacted within the set of contexts ) in the observation history; is set empirically and which means that the observed events provide more confidence in the user preferences than the unobserved ones.

The loss function will then take the form:

where weights are defined by (29), are the values of a binary feedback tensor of order , defined similarly to (27), and are the values of the reconstructed tensor.

The model uses CP with an ALS-based optimization procedure and a standard regularization similar to (20). The latent feature vectors are encoded in the rows of the factor matrices, not the columns, i.e. following the authors’ notation, we should rewrite (11) as:

where are transposed factors of the CP decomposition.

The authors show, how an efficient computation over the dense tensor can be achieved with the same tricks, that are used in [44] for the matrix case. The model also has a number of modifications [41]: based on the conjugate gradient approach (iTALS-CG) and the coordinates descent approach (iTALS-CD) where an additional features compression is achieved by the Cholesky decomposition. This makes the iTALS-CD model to learn even faster than MF methods. While performing on approximately the same level of accuracy as the state-of-the-art Factorization Machines (FM) method [71], it is capable of learning more complex latent relations structure. Another modification is the pairwise “PITF-like” reduction model, named iTALSx [39].

#### 5.4.5 Gff

The General Factorization Framework (GFF) [42] further develops the main ideas of the family of iTALS models. Within the GFF model different CP-based factorization models (also called a preference models) are combined in order to capture the intrinsic relations between users and items influenced by an arbitrary number of contexts. As in many other works the authors of GFF model fix the broad definition of the context as an entity, which “value is not determined solely by the user or the item”, i.e. not a content information (see Section 3.7).

The model can be better explained with the example. Let us consider the problem of learning the scoring function as follows:

(30) |

where and are the domains of *users* and *items* respectively; stands for *season* and denotes the periodicity of the events (see Section 5.3); describes the sequential consumption patterns, e.g. what are the previous items that were also consumed with the current one (see [40] for broader set of examples). Let us also define the pairwise interactions between users and items as (standard CF model), between items and seasons as and so forth. Using the same notation we can also define multi-relational interactions, such as for a 3-way user-item-season interactions or for the 4-way interactions between all 4 types of entities.

In total, there could be 2047 different combinations of interactions, yet not all of them are feasible in terms RS model, as not all of them may contribute to the preference model.

As the result, GFF generates a very flexible multirelational model that allows to pick the most appropriate scheme of interactions, that does not explode the complexity of the model and meanwhile achieves a high quality of recommendations. Based on the experiments the authors conclude: “leaving out useless interactions results in more accurate models”.

Name | Type | Algorithm | Domain | Entities | Optimization |
Ranking prediction |
Online |
---|---|---|---|---|---|---|---|

TOPHITS [49], 2005 | CP | ALS | Link prediction | Resources, Keyword | pointwise | Yes | No |

CubeSVD [86], 2005 | TD | HOSVD | Personalized Search | User, Resource, Query | pointwise | Yes | No |

RTF [72], 2009 | TD | SGD | Folksonomy | User, Item, Tag | pairwise | Yes | No |

BPTF [99], 2010 | CP | MCMC | Temporal | User, Item, Time | pointwise | No | No |

Multiverse [45], 2010 | TD | SGD | Context-awareness | User, Item, Contexts | pointwise | No | No |

PITF [73], 2010 | CP | SGD | Folksonomy | User, Item, Tag | pairwise | Yes | No |

TagTR [90], 2010 | TD | HOSVD | Folksonomy | User, Item, Tag | pointwise | Yes | Yes |

TFMAP [83], 2012 | CP | SGD | Context-awareness | User, Item, Context | listwise | Yes | No |

CARTD [74], 2012 | CP | SGD | Context-awareness | Item, Contexts | pairwise | Yes | No |

ClustHOSVD [88], 2015 | TD | HOOI | Folksonomy | User, Item, Tag | pointwise | Yes | No |

GFF [42], 2015 | CP | ALS | Context-awareness | User, Item, Contexts | pointwise | Yes | No |

*Ranking prediction*column shows whether a method is evaluated against ranking metrics. The

*Online*column denotes a support for real-time recommendations for new users (e.g. folding-in).

Method uses pairwise reduction concept, initially introduced in PITF.

We have reviewed so far a diverse set of tensor-based recommendation techniques. Clearly, tensors help represent and model complex environments in a convenient and congruent way, suitable for various problem formulations. Nevertheless, as we have already stated earlier, the most common practical task for RS is to build a ranked list of recommendations (a top- recommendations task). In this regard, we summarize related features of some of the most illustrative in our opinion methods in Table 2. We also take into account a support for real-time scenarios in dynamic environments.

### 5.5 Other models

Unfortunately, it is almost impossible to review all available TF models from various domains. The flexibility that comes with the tensor-based formulation provides means for limitless combinations of various RS settings and models. Here we briefly describe some of them, that were not referenced yet, but have an interesting application and/or implementation.

##### Social interactions.

The authors of [52] focus on recommending new connections for users in specific communities like online dating or professional networks They emphasize that typically there are two types of groups of people (for instance, employee and employer) in job seeking networks. In order to account for that and avoid unnecessary recommendations within the same group (e.g. employer-employer) they split the problem into two parallel subproblems corresponding to each individual group and model it with the CP. The final result is than aggregated from both subproblems by selecting only those predicted links (i.e. recommendations) which are present in both groups.

##### Social tagging.

A few works for image tagging [66, 7] use a interesting representation of data, initially proposed in [25]. Users and images, uploaded by them in social network, are encoded together into a single long vector. These vectors are used to build a set of adjacency matrices, that are made with respect to certain conditions and then stacked together in a tensor. With this approach, every frontal slice of the tensor describes different kinds of relations: friendship relations between users, user-image connections, tag relations for both users and items, etc.

##### Temporal models.

The authors of [102] add a so called *social regularization*, introduced in [56], into a standard optimization routine. The idea behind this modification is to use not only a “wisdom of crowds” like in standard CF approach, but also utilize information about social relationships (i.e. friendship) of the user in order to bring more trust into the recommendations an improve the overall accuracy.

The work [59] combines both social tagging and temporal models. The authors build a 4-th order tensor from (*user, item, tag, time*) quadruples and decompose it with the HOSVD. In order estimate relevance scores and recommend new items for users, they first summarize values, corresponding to the observations, over the third (tag) mode.

An interesting hybrid approach for modelling user preferences dynamics is proposed in [69]. The authors build a tensor from (*user, item, time-period*) triplets and combine it with an auxiliary content information (user attributes) with help of a *coupled tensor-matrix factorization method* [1, 28]. The idea of coupled tensor-matrix factorizations provides an additional level of flexibility in model construction and is used in various RS domains with complex setup [101, 9].

##### Multi-criteria ratings systems.

The authors of [95] explore the rich sentiment information in a product reviews. They extract opinions from text and craft a multi-aspect (or multi-criteria, see [2]) ratings system on top of it. This data is used to build a third order tensor in the form (*user, item, aspect*), with tensor values denoting the ratings within each aspect (including the explicit ratings). A CP-based factorization model is used to reconstruct missing values and predict unknown ratings more accurately.

A similar idea of multi-criteria ratings model was also used as a part of a sophisticated model in [62]. However, the authors did not have to do any text analysis as the aspect data was populated by users themselves and provided within the dataset. They also applied the HOSVD method instead of the CP.

##### Mobility and geolocation.

Modern social networks allow to share not only a content, such as images or videos, but also link that content to a specific locations using the Global Positioning System (GPS) services. With the broad access of mobile devices to the internet, this provides rich information about user interests and behavior and allows building highly personalized context-aware services and applications. For example, the authors of [91, 89] model location-based user activities with a third order tensor (*user, activity, location*) for providing locations and activities recommendations. The authors of [78] use the tensor for the personalized activities rating prediction. These works use the Tucker tensor format and apply the HOSVD for its reconstruction.

##### Cross-domain recommendations.

Another interesting direction is combining a cross-domain knowledge, e.g. user consumption patterns of books, movies, music, etc., in order to improve recommendations quality. Knowledge about the patterns from one domain may help to build more accurate predictions in another (this is a so called *knowledge transfer*

learning). Moreover, modelling these cross-domain relations mutually may also help to achieve a higher recommendations quality across all domains. An interesting challenge in these tasks is a varying number of items in different domains, which requires a special treatment. A few notable and quite different techniques of the tensor-based knowledge transfer learning are proposed in

[15] and [43].##### Special factorization methods.

In the theory of matrix approximations there is well known pseudo-skeleton decomposition method [33], that allows to use only a small sample of the original matrix elements in order to find an approximate matrix factorization within the desired accuracy. This result is shown to be generalizable to a higher order case [65, 63], and, remarkably, is especially suitable for sparse data. The main benefit of such a sampling approach is the reduced factorization complexity in terms of both the number of operations and the number of elements required for computation, which is especially advantageous in case of tensor-based models. A special case of such a class of TF algorithms is used in the TensorCUR model [57] for product recommendations.

## 6 Conclusion

In this survey we have attempted to overview a broad range of tensor-based methods used in recommender systems to date. As we have seen, these methods provide powerful set of tools for merging various types of additional information, that increases flexibility, customizability and quality of recommendation models. Tensorization enables creative and non-trivial setups, going far beyond standard user-item paradigm, and finds its applications in various domains. Tensor-based models can also be used as a part of more elaborate systems, providing compressed latent representations as an input for other well-developed techniques.

One of the main concerns for the higher order models is inevitable growth of computational complexity with increasing number of dimensions. Even for mid-sized production systems, that have to deal with highly dynamic environments, this might have negative implications, such as inability to produce recommendations for new users instantly, in a timely manner. This type of issues can be firmly addressed with incremental update and higher order folding-in techniques. The former allow to update the entire model, while performing computations only on new data. The latter allows to calculate recommendations in cases when new data is already present in the system but was not yet included into the model.

Despite the encouraging results, there is an issue related to the applicability of CP and TD decompositions. When the number of dimensions becomes much higher than 3, application of TD-based methods becomes infeasible due to explosion of storage requirements. On the other side, CP is generally ill-posed which may potentially lead to numerical instabilities. A possible cure for this problem is to use TT/HT decomposition. In our opinion, this is a promising direction for further investigations.

## 7 Acknowledgements

The authors would like to thank Maxim Rakhuba and Alexander Fonarev for their help for improving the manuscript, and also Michael Thess for insightful conversations.

## References

- [1] E. Acar, T. G. Kolda, and D. M. Dunlavy. All-at-once Optimization for Coupled Matrix and Tensor Factorizations. arXiv Prepr. arXiv1105.3422, 2011.
- [2] G. Adomavicius, N. Manouselis, and Y. Kwon. Multi-criteria recommender systems. Recomm. Syst. Handbook, pages 769–803, 2011.
- [3] G. Adomavicius, B. Mobasher, F. Ricci, and A. Tuzhilin. Context-Aware Recommender Systems. AI Mag., 32(3):67–80, 2011.
- [4] G. Adomavicius, R. Sankaranarayanan, S. Sen, and A. Tuzhilin. Incorporating contextual information in recommender systems using a multidimensional approach. ACM Trans. Inf. Syst., 23(1):103–145, 2005.
- [5] G. Adomavicius and A. Tuzhilin. Toward the Next Generation of Recommender Systems: a Survey of the State of the Art and Possible Extensions. IEEE Trans. Knowl. Data Eng., 17(6):734–749, 2005.
- [6] J. Allan, B. Croft, A. Moffat, M. Sanderson, J. Aslam, L. Azzopardi, N. Belkin, P. Borlund, P. Bruza, J. Callan, and et. al. Frontiers, Challenges, and Opportunities for Information Retrieval: Report from SWIRL 2012 the Second Strategic Workshop on Information Retrieval in Lorne. SIGIR Forum, 46(1):2–32, 2012.
- [7] P. Barmpoutis, C. Kotropoulos, and K. Pliakos. Image tag recommendation based on novel tensor structures and their decompositions. In Image Signal Process. Anal. (ISPA), 2015 9th Int. Symp. on. IEEE, pages 7–12, 2015.
- [8] M. Bazire and P. Brézillon. Understanding context before using it. In Model. using Context, pages 29–40. 2005.
- [9] P. Bhargava, T. Phan, J. Zhou, and J. Lee. Who, What, When, and Where: Multi-Dimensional Collaborative Recommendations Using Tensor Factorization on Sparse User-Generated Data. Proc. 24th Int. Conf. World Wide Web, pages 130–140, 2015.
- [10] J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez. Recommender systems survey. Knowledge-Based Syst., 46:109–132, 2013.
- [11] M. Brand. Incremental singular value decomposition of uncertain data with missing values. In Comput. Vision-ECCV 2002, pages 707–720. Springer, 2002.
- [12] P. Brusilovsky. Social information access: the other side of the social web. In SOFSEM 2008 Theory Pract. Comput. Sci., pages 5–22. Springer, 2008.
- [13] R. Burke. Hybrid web recommender systems. Adapt. web, pages 377–408, 2007.
- [14] O. Chapelle and M. Wu. Gradient descent optimization of smoothed information retrieval metrics. Inf. Retr. Boston., 13(3):216–235, 2010.
- [15] W. Chen, W. Hsu, and M. L. Lee. Making recommendations from multiple domains. In Proc. 19th ACM SIGKDD Int. Conf. Knowl. Discov. data Min. - KDD ’13, page 892, New York, New York, USA, aug 2013. ACM Press.
- [16] Y. Chi and S. Zhu. FacetCube. Proc. 19th ACM Int. Conf. Inf. Knowl. Manag. - CIKM ’10, page 569, 2010.
- [17] Y. Chi, S. Zhu, Y. Gong, and Y. Zhang. Probabilistic polyadic factorization and its application to personalized recommendation. In Int. Conf. Inf. Knowl. Manag. Proc., pages 941–950, 2008.
- [18] A. Cichocki, R. Zdunek, A. H. Phan, and S. I. Amari. Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. 2009.
- [19] P. Comon. Tensors : A brief introduction. IEEE Signal Process. Mag., 31(3):44–53, 2014.
- [20] P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on top-n recommendation tasks. Proc. fourth ACM Conf. Recomm. Syst. - RecSys ’10, 2010.
- [21] L. De Lathauwer, B. De Moor, and J. Vandewalle. A Multilinear Singular Value Decomposition. SIAM J. Matrix Anal. Appl., 21(4):1253–1278, jan 2000.
- [22] V. De Silva and L.-H. Lim. Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J. Matrix Anal. Appl., 30(3):1084—-1127, 2008.
- [23] S. Doerfel and K. D. E. Group. The Role of Cores in Recommender Benchmarking for Social Bookmarking Systems. ACM Trans. Intell. Syst. Technol., 7(3), 2016.
- [24] P. Dourish. What we talk about when we talk about context. Pers. Ubiquitous Comput., 8(1):19–30, 2004.
- [25] D. M. Dunlavy, T. G. Kolda, and W. P. Kegelmeyer. Multilinear algebra for analyzing data with multiple linkages. Graph Algorithms Lang. Linear Algebr., (April 2006):85–114, 2011.
- [26] C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211–218.
- [27] M. D. Ekstrand, J. T. Riedl, and J. A. Konstan. Collaborative filtering recommender systems. Found. Trends Human-Computer Interact., 4(2):81–173, 2011.
- [28] B. Ermis, E. Acar, and A. T. Cemgil. Link prediction in heterogeneous data via generalized coupled tensor factorization. Data Min. Knowl. Discov., 29(1):203–236, 2013.
- [29] X. Fu, K. Huang, W.-K. Ma, N. D. Sidiropoulos, and R. Bro. Joint Tensor Factorization and Outlying Slab Suppression With Applications. Signal Process. IEEE Trans., 63(23):6315–6328, 2015.
- [30] G. W. Furnas, S. Deerwester, S. T. Dumais, T. K. Landauer, R. A. Harshman, L. A. Streeter, and K. E. Lochbaum. Information retrieval using a singular value decomposition model of latent semantic structure. In Proc. 11th Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr., pages 465–480. ACM, 1988.
- [31] Z. Gantner, S. Rendle, and L. Schmidt-Thieme. Factorization models for context-/time-aware movie recommendations. Proc. Work. Context. Movie Recomm., pages 14–19, 2010.
- [32] G. H. Golub and C. Reinsch. Singular value decomposition and least squares solutions. Numer. Math., 14(5):403–420, 1970.
- [33] S. A. Goreinov, E. E. Tyrtyshnikov, and N. L. Zamarashkin. A Theory of Pseudoskeleton Approximations. Linear Algebra Appl., 261:1–21, 1997.
- [34] L. Grasedyck. Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal. Appl., 31(4):2029–2054, 2010.
- [35] L. Grasedyck, D. Kressner, and C. Tobler. A literature survey of low-rank tensor approximation techniques. GAMM Mitteilungen, 36(1):53–78, 2013.
- [36] W. Hackbusch. Tensor spaces and numerical tensor calculus, volume 42. Springer Science & Business Media, 2012.
- [37] T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin. The elements of statistical learning: data mining, inference and prediction. Math. Intell., 27(2):83–85, 2005.
- [38] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22(1):5–53, 2004.
- [39] B. Hidasi. Factorization models for context-aware recommendations. Infocommun J, VI(4):27–34, dec 2014.
- [40] B. Hidasi and D. Tikk. Fast ALS-based tensor factorization for context-aware recommendation from implicit feedback. In Mach. Learn. Knowl. Discov. Databases, pages 67–82. 2012.
- [41] B. Hidasi and D. Tikk. Context-aware recommendations from implicit data via scalable tensor factorization. arXiv Prepr. arXiv1309.7611, 2013.
- [42] B. Hidasi and D. Tikk. General factorization framework for context-aware recommendations. Data Min. Knowl. Discov., may 2015.
- [43] L. Hu, J. Cao, G. Xu, L. Cao, Z. Gu, and C. Zhu. Personalized Recommendation via Cross-Domain Triadic Factorization. Proc. 22nd Int. Conf. World Wide Web . Int. World Wide Web Conf. Steer. Comm., pages 595–605, 2013.
- [44] Y. Hu, C. Volinsky, and Y. Koren. Collaborative filtering for implicit feedback datasets. Proc. - IEEE Int. Conf. Data Mining, ICDM, pages 263–272, 2008.
- [45] A. Karatzoglou, X. Amatriain, L. Baltrunas, and N. Oliver. Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. In Proc. fourth ACM Conf. Recomm. Syst. - RecSys ’10, page 79, 2010.
- [46] B. P. Knijnenburg and M. C. Willemsen. Evaluating Recommender Systems with User Experiments. In Recomm. Syst. Handbook, pages 309–352. Springer, 2015.
- [47] T. Kolda and B. Bader. The TOPHITS model for higher-order web link analysis. Work. Link Anal. Counterterrorism Secur., 7:26–29, 2006.
- [48] T. G. Kolda and B. W. Bader. Tensor Decompositions and Applications. SIAM Rev., 51(3):455–500, aug 2009.
- [49] T. G. Kolda, B. W. Bader, and J. P. Kenny. Higher-order web link analysis using multilinear algebra. In Proc. - IEEE Int. Conf. Data Mining, ICDM, pages 242–249, 2005.
- [50] J. A. Konstan and J. Riedl. Recommender systems: From algorithms to user experience. User Model. User-adapt. Interact., 22(1-2):101–123, 2012.
- [51] Y. Koren, P. Ave, and F. Park. Factorization Meets the Neighborhood : a Multifaceted Collaborative Filtering Model. In Proc. 14th ACM SIGKDD Int. Conf. Knowl. Discov. data Min., pages 426—-434, 2008.
- [52] S. Kutty, L. Chen, and R. Nayak. A people-to-people recommendation system using tensor space models. Proc. 27th Annu. ACM Symp. Appl. Comput. - SAC ’12, page 187, 2012.
- [53] J. Lee, D. Lee, Y.-C. Lee, W.-S. Hwang, and S.-W. Kim. Improving the Accuracy of Top-N Recommendation using a Preference Model. Inf. Sci. (Ny)., 2016.
- [54] Z. Liao, C. Wang, and M. Zhang. A Tripartite Tensor Decomposition Fold- in for Social Tagging. J. Appl. Sci. Eng., 17(4):363–370, 2014.
- [55] P. Lops, M. D. Gemmis, and G. Semeraro. Content-based Recommender Systems: State of the Art and Trends. ”Recomm. Syst. Handbook”, pages 73–105, 2011.
- [56] H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender systems with social regularization. In Proc. fourth ACM Int. Conf. Web search data Min., pages 287–296. ACM, 2011.
- [57] M. W. Mahoney, M. Maggioni, and P. Drineas. Tensor-CUR Decompositions for Tensor-Based Data. SIAM J. Matrix Anal. Appl., 30(3):957–987, 2008.
- [58] L. B. Marinho, A. Nanopoulos, L. Thieme-Schmidt, R. Jaschke, A. Hotho, G. Stumme, and P. Symeonidis. Social Tagging Recommender Systems. Recomm. Syst. Handbook, pages 615–644, 2010.
- [59] N. Misaghian, M. Jalali, and M. H. Moattar. Resource recommender system based on tag and time for social tagging system. Proc. 3rd Int. Conf. Comput. Knowl. Eng. ICCKE 2013, pages 97–101, 2013.
- [60] A. Nanopoulos, D. Rafailidis, P. Symeonidis, and Y. Manolopoulos. MusicBox: Personalized Music Recommendation Based on Cubic Analysis of Social Tags. IEEE Trans. Audio. Speech. Lang. Processing, 18(2):407–412, 2010.
- [61] C. Navasca, L. De Lathauwer, and S. Kindermann. Swamp reducing technique for tensor decomposition. In Signal Processing Conference, 2008 16th European, pages 1–5. IEEE, 2008.
- [62] M. Nilashi, O. B. Ibrahim, and N. Ithnin. Multi-criteria collaborative filtering with high accuracy using higher order singular value decomposition and Neuro-Fuzzy system. Knowledge-Based Syst., 60:82–101, 2014.
- [63] I. Oseledets and E. Tyrtyshnikov. TT-cross approximation for multidimensional arrays. Linear Algebra Appl., 432(1):70–88, 2010.
- [64] I. V. Oseledets. Tensor-Train Decomposition. SIAM J. Sci. Comput., 33(5):2295–2317, 2011.
- [65] I. V. Oseledets, D. V. Savostianov, and E. E. Tyrtyshnikov. Tucker Dimensionality Reduction of Three-Dimensional Arrays in Linear Time. SIAM J. Matrix Anal. Appl., 30(3):939–956, 2008.
- [66] M. Panagopoulos and C. Kotropoulos. Image tagging using tensor decomposition. In Information, Intell. Syst. Appl. (IISA), 2015 6th Int. Conf., 2015.
- [67] D. Parra and P. Brusilovsky. Collaborative filtering for social tagging systems: an experiment with CiteULike. In Proc. third ACM Conf. Recomm. Syst., pages 237–240. ACM, 2009.
- [68] D. Rafailidis and P. Daras. The TFC model: Tensor factorization and tag clustering for item recommendation in social tagging systems. IEEE Trans. Syst. Man, Cybern. Part ASystems Humans, 43(3):673–688, 2013.
- [69] D. Rafailidis and A. Nanopoulos. Modeling the Dynamics of User Preferences in Coupled Tensor Factorization. In Proc. 8th ACM Conf. Recomm. Syst. - RecSys ’14, pages 321—-324, 2014.
- [70] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-thieme. BPR : Bayesian Personalized Ranking from Implicit Feedback. Proc. Twenty-Fifth Conf. Uncertain. Artif. Intell., cs.LG:452–461, 2009.
- [71] S. Rendle, Z. Gantner, C. Freudenthaler, and L. Schmidt-Thieme. Fast context-aware recommendations with factorization machines. In Proc. 34th Int. ACM SIGIR Conf. Res. Dev. Inf. Retr., pages 635–644. ACM, 2011.
- [72] S. Rendle and L. B. Marinho. Learning optimal ranking with tensor factorization for tag recommendation. Kdd, pages 727–736, 2009.
- [73] S. Rendle and L. Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. Proc. third ACM Int. Conf. Web search data Min. (WSDM ’10), pages 81–90, 2010.
- [74] A. Rettinger, H. Wermser, Y. Huang, and V. Tresp. Context-aware tensor decomposition for relation prediction in social networks. Soc. Netw. Anal. Min., 2(4):373–385, 2012.
- [75] F. Ricci, L. Rokach, and B. Shapira. Recommender Systems: Introduction and Challenges, pages 1–34. Springer, 2015.
- [76] D. A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental learning for robust visual tracking. Int. J. Comput. Vis., 77(1-3):125–141, 2008.
- [77] A. Said and A. Bellogín. Comparative recommender system evaluation. Proc. 8th ACM Conf. Recomm. Syst. - RecSys ’14, pages 129–136, 2014.
- [78] M. Sattari, I. H. Toroslu, P. Karagoz, P. Symeonidis, and Y. Manolopoulos. Extended feature combination model for recommendations in location-based mobile services. Knowl. Inf. Syst., 44(3):629–661, 2014.
- [79] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen. The Adaptive Web: Methods and Strategies of Web Personalization, chapter Collaborat, pages 291–324. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
- [80] L. Shan, L. Lin, C. Sun, and X. Wang. Predicting ad click-through rates via feature-based fully coupled interaction tensor factorization. Electron. Commer. Res. Appl., 16:30–42, 2016.
- [81] G. Shani and A. Gunawardana. Evaluating recommendation systems. Recomm. Syst. Handbook, pages 257–298, 2011.
- [82] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis, volume 47. 2004.
- [83] Y. Shi, A. Karatzoglou, L. Baltrunas, M. Larson, A. Hanjalic, and N. Oliver. TFMAP: Optimizing MAP for top-n context-aware recommendation. Proc. 35th Int. ACM SIGIR Conf. Res. Dev. Inf. Retr., pages 155–164, 2012.
- [84] Y. Shi, M. Larson, and A. Hanjalic. Collaborative Filtering beyond the User-Item Matrix : A Survey of the State of the Art and Future Challenges. ACM Comput. Surv., 47(1):1–45, 2014.
- [85] S. Sizov, S. Staab, and T. Franz. Analysis of Social Networks by Tensor Decomposition, pages 45–58. Springer, 2010.
- [86] J.-T. Sun, H.-J. Zeng, H. Liu, Y. Lu, and Z. Chen. CubeSVD: A Novel Approach to Personalized Web Search. In Proc. 14th Int. Conf. World Wide Web - WWW ’05, page 382, 2005.
- [87] P. Symeonidis. User recommendations based on tensor dimensionality reduction. IFIP Int. Fed. Inf. Process., 296:331–340, 2009.
- [88] P. Symeonidis. ClustHOSVD: Item Recommendation by Combining Semantically Enhanced Tag Clustering With Tensor HOSVD. IEEE Trans. Syst. Man, Cybern. Syst., pages 1–12, 2015.
- [89] P. Symeonidis, A. Krinis, and Y. Manolopoulos. GeoSocialRec: Explaining recommendations in location-based social networks. In Adv. Databases Inf. Syst., pages 84–97, 2013.
- [90] P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos. A unified framework for providing recommendations in social tagging systems based on ternary semantic analysis. IEEE Trans. Knowl. Data Eng., 22(2):179–192, 2010.
- [91] P. Symeonidis, A. Papadimitriou, Y. Manolopoulos, P. Senkul, and I. Toroslu. Geo-social recommendations based on incremental tensor reduction and local path traversal. Proc. 3rd ACM SIGSPATIAL Int. Work. Locat. Soc. Networks - LBSN ’11, page 1, 2011.
- [92] P. Symeonidis, M. Ruxanda, A. Nanopoulos, and Y. Manolopoulos. Ternary Semantic Analysis of Social Tags for Personalized Music Recommendation. In ISMIR, volume 8, pages 219–224, 2008.
- [93] P. Symeonidis, P. Symeonidis, A. Nanopoulos, A. Nanopoulos, Y. Manolopoulos, and Y. Manolopoulos. Tag recommendations based on tensor dimensionality reduction. ACM Conf. Recomm. Syst., page 7, 2008.
- [94] L. Wang, X. Meng, and Y. Zhang. Applying HOSVD to alleviate the sparsity problem in Context-aware recommender systems. Chinese J. Electron., 22(4):773–778, 2013.
- [95] Y. Wang, Y. Liu, and X. Yu. Collaborative filtering with aspect-based opinion mining: A tensor factorization approach. In Proc. - IEEE Int. Conf. Data Mining, ICDM, pages 1152–1157, 2012.
- [96] Z. Wang and L. He. User identification for enhancing IP-TV recommendation. Knowledge-Based Syst., 2016.
- [97] H. Wermser, A. Rettinger, and V. Tresp. Modeling and learning context-aware recommendation scenarios using tensor decomposition. In Proc. - 2011 Int. Conf. Adv. Soc. Networks Anal. Mining, ASONAM 2011, pages 137–144, 2011.
- [98] W. Woerndl and J. Schlichter. Introducing Context into Recommender Systems. AAAI’07 Work. Recomm. Syst. e-Commerce, pages 138–140, 2007.
- [99] L. Xiong, X. Chen, T.-k. Huang, J. Schneider, and J. G. Carbonell. Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization. Proc. SIAM Int. Conf. Data Min., pages 211—-222, 2010.
- [100] Y. Xu, L. Zhang, and W. Liu. Cubic Analysis of Social Bookmarking for Personalized Recommendation. Front. WWW Res. Dev. - APWeb 2006, pages 733–738, 2006.
- [101] Z. Yang, D. Yin, and B. D. Davison. Recommendation in Academia: A joint multi-relational model. In ASONAM 2014 - Proc. 2014 IEEE/ACM Int. Conf. Adv. Soc. Networks Anal. Min., pages 566–571. IEEE, aug 2014.
- [102] L. Yao, Q. Z. Sheng, Y. Qin, X. Wang, A. Shemshadi, and Q. He. Context-aware Point-of-Interest Recommendation Using Tensor Factorization with Social Regularization. SIGIR 2015 Proc. 38th Int. ACM SIGIR Conf. Res. Dev. Inf., pages 1007–1010, 2015.
- [103] H. Zha and H. Simon. On updating problems in latent semantic indexing. SIAM J. Sci. Comput., 21(2):1–9, 1999.
- [104] M. Zhang, C. Ding, and Z. Liao. Tensor fold-in algorithms for social tagging prediction. In Proc. - IEEE Int. Conf. Data Mining, ICDM, pages 1254–1259, 2011.
- [105] W. Zhang, H. Sun, X. Liu, Xiaohui, and Guo. An Incremental Tensor Factorization Approach for Web Service Recommendation. 2014 IEEE Int. Conf. Data Min. Work., pages 346–351, 2014.
- [106] X. Zhao, X. Li, L. Liao, D. Song, and W. K. Cheung. Crafting a Time-Aware Point-of-Interest Recommendation via Pairwise Interaction Tensor Factorization. In Knowl. Sci. Eng. Manag., pages 458–470. Springer, 2015.
- [107] G. Zhou, A. Cichocki, Q. Zhao, and S. Xie. Nonnegative matrix and tensor factorizations: An algorithmic perspective. IEEE Signal Process. Mag., 31(3):54–65, 2014.