Matching Media Contents with User Profiles by means of the Dempster-Shafer Theory

The media industry is increasingly personalizing the offering of contents in attempt to better target the audience. This requires to analyze the relationships that goes established between users and content they enjoy, looking at one side to the content characteristics and on the other to the user profile, in order to find the best match between the two. In this paper we suggest to build that relationship using the Dempster-Shafer's Theory of Evidence, proposing a reference model and illustrating its properties by means of a toy example. Finally we suggest possible applications of the model for tasks that are common in the modern media industry.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

12/24/2016

Predicting the Industry of Users on Social Media

Automatic profiling of social media users is an important task for suppo...
07/23/2018

Influence of Selective Exposure to Viewing Contents Diversity

Personalization, including both self-selected and pre-selected, is inevi...
03/28/2017

Diving Deep into Clickbaits: Who Use Them to What Extents in Which Topics with What Effects?

The use of alluring headlines (clickbait) to tempt the readers has becom...
08/30/2018

VirtualIdentity: Privacy-Preserving User Profiling

User profiling from user generated content (UGC) is a common practice th...
01/05/2020

User Profiling Using Hinge-loss Markov Random Fields

A variety of approaches have been proposed to automatically infer the pr...
02/24/2021

3D4ALL: Toward an Inclusive Pipeline to Classify 3D Contents

Algorithmic content moderation manages an explosive number of user-creat...
03/05/2021

Extend the FFmpeg Framework to Analyze Media Content

This paper introduces a new set of video analytics plugins developed for...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Digital technologies are radically changing the way of performing business in media industry, with new possibilities of tailoring the catalog so that everybody has the chance of enjoying contents that best fit his/her interests, often on demand, at the time that is most appropriate for each user. Such a change is requiring to reformulate the way of building the content offering. Data collected from customers regarding their profile and preferences become central, so models able to interpret and to reason about data.

These models aims to discover and exploit the relationship that stands between users and media contents they enjoy. Here the problem is not to ask directly the user what are his/her interests and preferences, but to infer them by looking at those contents they access and to the feedback they provide about them. The ultimate goal is to learn a model from data able to link user to the vast catalog of contents made available by a large media company.

Looking at past interactions is useful to help users to discover contents that they would appreciate as valuable part of the product they paid for. This means to improve the customer retention and foster their upgrade towards more profitable products. The benefits coming from the implementation and use of these models go beyond existing contents and customers. They also help to propose new contents to existing customers, and on the other way to support new customers in discovering existing contents. Soon, new contents and new customers become part of the model, enriching the dataset of new entities, along a self-growing process. Predictiveness of models make them also suitable to support the acquisition of new contents and customers.

These models are at the core logic of recommender systems (RS), that obtained large attention once Netflix showed potentiality of algorithms in developing and supporting their streaming platform [1]. Recommender systems gained large application because of the e-commerce diffusion. They are generally grouped in different types, including Content-based recommenders [2], Collaborative recommenders [3], Demographic recommenders [4], and Hybrid recommenders [5].

The purpose of a recommender system is to provide a suggestion, regarding available alternatives, by scoring and ranking them according to the user preferences. In order to accomplish its task, a recommender system requires information regarding the user profile and habits with respect to the different alternatives that can be proposed to him. This information can be acquired explicitly by asking the users to rate items or implicitly by monitoring users’ behavior (booked hotels or heard songs). RS can also use other kinds of information as demographic features (e.g, age, gender) or social information. The research related to RS has been focused on movies, music and books [6], being music recommendations the most studied topic, although later it has been applied to other e-commerce domains [7].

Similar to RS, we need data about user likings regarding catalog items such as movies, series and shows. Such information can be gathered by asking the user to rate the items, e.g., by using stars or likes, or implicitly by monitoring the customer behavior, e.g., which item enjoyed fully an which partially, how often they accessed the content description, etc. In addition we need other information regarding demographics such age, gender, family members, job, etc. The objective is to relate user profiles to content descriptors. Different techniques have been experimented in order to discover and exploit this relationship. Most of them take the form of information fusion.

Following the idea explored by [8], and more concretely the model developed in [9], we aim to build a relationship model based on the Dempster-Shafer’s Theory of Evidence (D-S theory) [10, 11] and to use it to make inference regarding the relationship between users and contents. The reminder of this paper is organized as follows: Section II provides some preliminaries regarding D-S Theory; Section III describes the model; Section IV outlines some examples of application; Section V draws conclusions and future directions.

Ii Preliminaries

The Dempster-Shafer theory, also known as the Theory of Evidence [10, 11], is used as basis for the preference model presented in [9]

. In D-S theory, basic probabilities are allocated to subsets, instead of elements, according to the following definitions.

Definition 1.

A function over a set is called a basic probability assignment if

Definition 2.

Let be a set, then is a focal element if . In addition, represents the set of focal elements induced by .

Definition 3.

Let be a basic probability assignment function over a set . The Belief of induced by is defined as follows

(1)
Definition 4.

Let be a basic probability assignment function over a set . The Plausibility of induced by is defined as follows

(2)

The relationship between Plausibility and Belief is given by the following equation:

(3)

where is the complement of to .

When the probability basic assignments are given by different sources, it is possible to combine them. The first and most common combination method is known as the Dempster’s rule, that is defined as follows:

Definition 5.

Let and be two basic probability assignments, the joint basic probability assignment is computed as

(4)

where

(5)

is a measure of conflict between the two basic probability assignment sets. In addition, it is assumed .

Belief and Plausibility are monotonic functions with respect to inclusion. This means that if we consider the lattice of subsets, as shown in Fig. 1, Belief and Plausibility will increase from bottom () to top (). In particular Belief and Plausibility will be kept constant as far as we move to nodes that do not a probability mass assigned to them. As consequence of this property, we can identify regions of connected nodes, each assuming a specific value of Belief or Plausibility, as illustrated by Fig. 2.

Fig. 1: The Boolean lattice of item subsets from with focal elements [9]

.

In this example, focal elements are , and with the associated basic probability assignments , and (assuming ). This leads to identify 8 groups in the lattice, each with Belief and Plausibility depending from a focal subset of . Fig. 2 outlines these regions for both Belief and Plausibility. we can observe how all portions of lattice associated to a given value of Belief or Plausibility are connected.

Fig. 2: Belief (top) and Plausibility (bottom) regions induced by [9]
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 3: Plausibility levels

If we sort the Belief (or Plausibility) values in ascending order, we get a sequence of levels, each grouping the nodes into those that are below the level and over the level. For instance, if we assume

we get the situation depicted by Fig. 3 with respect to Plausibility. The following definitions enable the concept of classes of equivalence among the subsets with respect to Belief or Plausibility and to identify those elements that are most representative of the class.

Definition 6 (Core).

Given a subset , the set of focal elements included in , core of , is defined as

(6)
Definition 7 (Support).

Given a subset and the set of focal elements (even partially in ), support of , is defined as

(7)

For instance, according to the example in Fig. 1 , we have and . It is straightforward that , for all . The core and support represent the basis for computing respectively the Belief and the Plausibility of . The core and the support are able to group the subsets of into classes of equivalence as the following definition states.

Definition 8 ( and Equivalence).

Two sets and are said to be -equivalent if and only if . A -equivalence class is defined as the collection

(8)

In addition, and are -equivalent if and only if . The -equivalence class obtained from this relation. is defined as

(9)

Fig. 4(a) provides an example of -equivalence class assuming as core . Fig. 4(b) shows the -equivalence class for the support .

(a)
(b)
Fig. 4: Examples of -equivalence (a) and -equivalence (b) classes

As an immediate consequence, if and are -equivalent, then , while if they are -equivalent, .

and equivalence classes perform a partitioning of . Thus, each subset can belong only to one equivalence class. Grouping subsets in and equivalence classes allows (i) to explore the lattice by moving across classes, instead of exploring the whole item subset space, and (ii) to choose a representative of each class, so that the list of recommended items is shorter. For instance, we might be interested in using the smallest subset within a -equivalence class.

As representative of a equivalence class we can assume the smallest subset. We call this set minimal. For instance, for the class , the core is and the minimal is . It is possible to prove that each equivalence class as one single minimal. Conversely, for equivalence classes we assume as representative the largest subset, that we call maximal. Similarly to equivalence classes, it is possible to prove that any equivalence class has one single maximal. For example, the class , whose support is , as as maximal.

Iii Model

In the context of our interest we assume as the set of items belonging to the content catalog, while as the set of users.

Both sets are projected on two feature spaces, respectively made of and dimensions. The first is referred to the set of characteristics describing the items in , , while the second to the user profiling . Both spaces are discrete, so that each and can assume a finite number of values.

The relationship between items and users is expressed by a choice matrix, as that shown in Tab. I. The choice matrix is places side by side to the item characteristics matrix (left side) and to the profile matrix (top).

1
2
3
m
TABLE I: Structure of dataset assumed by the model

In general, data points and are multi-valued, meaning that they are represented by sets of values. For instance if is representing the movie cast, is represented by the list of actors that are featuring in the movie . Similarly, if is ”interests”, will list what the user is interested in. In other cases they are single-valued, such as in the case of characteristics such as ”director” and ”year” or in the case of profiling features such as ”age” or ”location”. An example of this matrix is given in Tab.II.

Age 30s 30s 20s 40s
Gender M F M M
Location IT IT SP IT
5-9 Interests Movies Books Sport Books Music Sport
Director Year Stars Genre 1 2 3 4
Boyle 1996 Ewan McGregor, Ewen Bremner Drama 0
Levinson 1996 Robert De Niro, Kevin Bacon, Brad Pitt Crime, Drama, Thriller 1
Scorsese 2015 Robert De Niro, Leonardo DiCaprio, Brad Pitt Short, Comedy 2
Scorsese 1990 Robert De Niro, Ray Liotta, Joe Pesci Biography, Crime, Drama 3
Boyle 2000 Leonardo DiCaprio Adventure, Drama, Romance 4
Howard 1995 Tom Hanks, Kevin Bacon Adventure, Drama, History 5
Zemeckis 1994 Tom Hanks Comedy, Drama 6
Zemeckis 1985 Michael J. Fox, Christopher Lloyd Adventure, Sci-Fi 7
Edwards 2016 Felicity Jones, Diego Luna Adventure, Sci-Fi 8
Scott 2015 Matt Damon Adventure, Drama, Sci-Fi 9
TABLE II: The dataset used as example.

Let us denote with the overall set of values assumed over the item characteristic , and with the overall set of values for the user profiling feature . They are respectively given in Tab.III and Tab.IV.

Director Year Actors Genre
Boyle 1996 Ewan McGregor Crime
Levinson 2015 Ewen Bremner Drama
Scorsese 1990 Ray Liotta Thriller
Howard 2000 Robert De Niro Short
Zemeckis 1995 Kevin Bacon Comedy
Edwards 1994 Brad Pitt Biography
Scott 1985 Leonardo DiCaprio Adventure
2016 Joe Pesci Romance
Ray Liotta History
Tom Hanks Sci-Fi
Michael J. Fox
Christopher Lloyd
Felicity Jones
Diego Luna
Matt Damon
TABLE III: Overall sets of item characteristics
Age Gender Location Interests
20s M IT Books
30s F SP Movies
40s SP Sport
Music
TABLE IV: Overall sets of user profiling features

Since here we are interested to use both information regarding the item characteristics and the user profiles, we compute for any

(10)

where

  • , with being the overall set of a given characteristic or a profiling feature .

  • is the set of preferences (”likes”)

  • is the subset of preferences referred to

It is easy to prove that and . Assuming that in our example , some example of masses assigned to characteristics are given below.

  • Stars:

  • Director:

  • Year:

  • Genre:

If we refer to profiling features, some examples are the following:

  • Age:

  • Gender:

  • Location:

  • Interests:

We notice that focal elements of each dimension are given by its unique values, i.e. by rows after removing duplicates. For instance, for the ”Director” and ”Year” dimensions, focal elements are given by the set of director names, i.e., Boyle, Levinson, Scorsese, Howard, Zemeckis, Edwards and Scott, and by years, i.e., 1985, 1990, 1994, 1995, 1996, 2000, 2015, 2016. Similarly for ”Age”, i.e., 20s, 30s, 40s, ”Location”, i.e., IT, SP, and ”Gender”, i.e., M, F. They are all single-value dimensions. For them, focal elements are singletons. In this case the model becomes additive. For instance,

Instead, the dimensions ”Genre” and ”Stars” are multi-value, so their focal elements are not singletons. For instance, Drama, Comedy-Drama and Adventure-Drama-History are three focal elements of ”Genre”. Similarly, Movies-Books, Books, Sport, Music-Sport are focal elements of ”Interests” among the profiling features. For multi-value dimensions the model is not additive. As an example, let us consider the belief of Adventure-Comedy-Sci-Fi-Drama. We have,

Conversely, we have

because all focal elements of ”Genre” are involved in its computation.

So far, we considered each dimension in isolation. They provide a range for probability , with , that is a measure of likelihood that a content in characterized by will be enjoyed by the set of users in , if is referred to some item characteristic . Or, if we look at as referred to some profiling feature , it is the likelihood that a user in will enjoy the catalog of contents offered by means of .

If we would like to look at multiple dimensions we are not allowed to use the Dempter’s combination rule as described in the section above. The main issue is that dimensions belong to different domains, so that the information fusion given by Eq.(4) cannot be performed over comparable sets. This problem can be solved when we look at focal elements as representative of preferences over the matrix . Let and two features defined over different dimensions. We can combine the two by means of conjunction or disjunctions, depending on the semantics we associate to the operation.

Thus, in order to perform a combination of and we need to look at and . In the case of conjunction , we have , so that

(11)

For instance, if is Zemeckis and is Adventure-Sci-Fi, we have that is made of preferences at rows 6 and 7, while at rows 7 and 8, so that is made only of row 7, and . Once we have focal elements for the conjunction of the ”Director” and ”Genre”, we can compute the belief and plausibility over the conjunction of the two. For instance,

and

The meaning of is the likelihood that a drama directed by Zemeckis will be enjoyed given and users in , that is exactly 1 over 15.

The other way of combining two dimensions is by means of disjunction. In this case and

(12)

For instance, with regard to profiling features, if is 20s and is Sport, we have

as the conjunction of the two collect rows 0,2,3,4,5,8,9. In this case, both belief and plausibility are larger.

Iv Applications in Media Industry

The model presented so far can be employed for different tasks. We briefly outline some of them below.

Recommendations. The model can be used to suggest a content to a user according to each dimension. For instance, chosen the dimension of ”Director”, the system might suggest directors that are most likely be of interest for the user. It is also possible to combine different dimensions. For example, ”Genre” and ”Year”. In any case, the inference of preferences is performed by looking at users indistinguishably, meaning that profile information is not taken into account.

Audience targeting.

In this case, given a single content we are interested to find user profiles that might be interested to it. For example, given a new movie, the model might estimate how likely could be of interest for each range of age. Also in this case it is possible to combine multiple dimensions that are user profiling features. For instance, considering multiple age ranges, taking into account the different genders.

Content bundling. This application is aimed to propose a bundle of contents to a group of users, possibly with different profiles. This result can be suggested by the model through a combination of dimensions among characteristics and profiling features. The process can be led by two different perspectives. The first moves from the bundle of contents and it is aimed at identifying a group of users that might be interested in. For instance, given all drama movies in 90s, which users could be interested to such an offer. But it is also possible to move the other way round: selected a group of users, what is the bundle of contents that might be of their interest. With respect to our example, given users in the 20s that are interested to books, what is the bundle of contents that could be likely of their interest.

Segmentation. This is a generalization of the problem above. In this case both users and contents are objective of the analysis. We are interested to find clusters of users, contents and user/contents that maximize the likelihood of preferences within the group and minimize the likelihood of preferences between groups. For instance, by looking at our example, we could be interested to see if there are users with different profiles that are likely to enjoy the same contents, or if there are contents are have similar likelihood to be enjoyed by the audience, or if there groups of users that are likely to enjoy the same group of contents, besides the others.

In all tasks above, it plays a key role the possibility of comparing and ranking alternatives. However, the D-S theory provides only an imprecise probability that ranges between the lower bound given by the degree of belief and the upper bound given by the degree of plausibility. This issue can be addressed by different approaches.

The first approach is to use a degree that is representative of a range, such as the middle point between belief and plausibility. Another possibility is to use only belief degrees (conservative approach) or plausibility (challenging approach). Another approach could be to randomly choose pairs from both ranges and to use the majority or pairwise comparisons in order to decide the order of two alternatives. It is also possible to choose randomly an alternative when the two cannot be sorted. Finally, it is possible to look at other solutions investigated in the field of partial order theory.

V Conclusions

In this paper, we further investigated a preference model based on the Dempster-Shafer theory and its application to media industry. This work is an evolution of what has been done so far by introducing some elements of novelty. Among them the possibility of including the user profile as part of the inference, instead of being considered neutrally with respect to different applications and problems that have been discussed in the section before. There are still some issue to address. The most important is referred to scalability of the model. Indeed, the nature of the D-S theory is inherently combinatorial, so that the search space is exploding by including more elements within the dimension overall sets . The possibility of defining equivalence classes in terms of belief and plausibility is a way to reduce complexity, but still work has to be done to make this solution feasible in practice. In addition, the model presented here requires to be validated. This can be done by looking at correspondences between the probability ranges and the frequency of positive voted that are after recorded. In the future we aim to develop further the model in order to include more complex queries and to solve issues regarding the application of the model in practice with respect to large catalogs and audience.

References

  • [1] C. A. Gomez-Uribe and N. Hunt, “The netflix recommender system: Algorithms, business value, and innovation,” ACM Trans. Manage. Inf. Syst., vol. 6, no. 4, pp. 13:1–13:19, Dec. 2015. doi: 10.1145/2843948. [Online]. Available: http://doi.acm.org/10.1145/2843948
  • [2] J. Salter and N. Antonopoulos, “Cinemascreen recommender agent: Combining collaborative and content-based filtering,” IEEE Intelligent Systems, vol. 21, no. 1, pp. 35–41, Jan. 2006. doi: 10.1109/MIS.2006.4. [Online]. Available: http://dx.doi.org/10.1109/MIS.2006.4
  • [3] L. Candillier, F. Meyer, and M. Boullé, “Comparing state-of-the-art collaborative filtering systems,” pp. 548–562, 2007.
  • [4] M. J. Pazzani, “A framework for collaborative, content-based and demographic filtering,” Artif. Intell. Rev., vol. 13, no. 5-6, pp. 393–408, Dec. 1999. doi: 10.1023/A:1006544522159. [Online]. Available: http://dx.doi.org/10.1023/A:1006544522159
  • [5] R. Burke, “Knowledge-based recommender systems,” in Encyclopedia of Library and Information Science, vol. 69, A. Kent, Ed.   Taylor and Francis, 2000, pp. 180–201.
  • [6] J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez, “Recommender systems survey,” Knowledge-Based Systems, vol. 46, no. 0, pp. 109 – 132, 2013. doi: 10.1016/j.knosys.2013.03.012. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0950705113001044
  • [7] J. J. Castro-Schez, R. Miguel, D. Vallejo, and L. M. López-López, “A highly adaptive recommender system based on fuzzy logic for {B2C} e-commerce portals,” Expert Systems with Applications, vol. 38, no. 3, pp. 2441 – 2454, 2011. doi: 10.1016/j.eswa.2010.08.033
  • [8] K. Zhang and H. Li, “Fusion-based recommender system,” in Information Fusion (FUSION), 2010 13th Conference on, July 2010. doi: 10.1109/ICIF.2010.5712091 pp. 1–7.
  • [9] L. Troiano, L. J. Rodríuez-Muñiz, and I. Díaz, “Discovering user preferences using dempster-shafer theory,” Fuzzy Sets and Systems, vol. 278, pp. 98 – 117, 2015. doi: http://dx.doi.org/10.1016/j.fss.2015.06.004
  • [10] A. P. Dempster, “Upper and lower probabilities induced by a multivalued mapping,” Annals of Mathematical Statistics, vol. 38, pp. 325–339, 1967.
  • [11] G. Shafer, A Mathematical Theory of Evidence.   Princeton: Princeton University Press, 1976.