Designing and Connectivity Checking of Implicit Social Networks from the User-Item Rating Data

04/05/2020 ∙ by Suman Banerjee, et al. ∙ IIT Gandhinagar 0

Implicit Social Network is a connected social structure among a group of persons, where two of them are linked if they have some common interest. One reallife example of such networks is the implicit social network among the customers of an online commercial house, where there exists an edge between two customers if they like similar items. Such networks are often useful for different commercial applications such as target advertisement, viral marketing, etc. In this article, we study two fundamental problems in this direction. The first one is that, given the useritem rating data of an ECommerce house, how we can design implicit social networks among its users and the second one is at the time of designing itself can we obtain the connectivity information among the users. Formally, we call the first problem as the Implicit User Network Design Problem and the second one as Implicit User Network Design with Connectivity Checking Problem. For the first problem, we propose three different algorithms, namely `Exhaustive Search Approach', `Clique Addition Approach', and `Matrix MultiplicationBased Approach'. For the second problem, we propose two different approaches. The first one is the sequential approach: designing and then connectivity checking, and the other one is a concurrent approach, which is basically an incremental algorithm that performs designing and connectivity checking simultaneously. Proposed methodologies have experimented with three publicly available rating network datasets such as Flixter, Movielens, and Epinions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 20

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A social network is an interconnected structure among a group of agents that is formed for social interactions Wasserman et al. (1994). Here, agents may be the customers of a commercial house, researchers, etc. and their relationship is friendship, co-authorship, respectively. These are nowadays open platforms, where information, rumors, ideas, innovations, etc. spread widely and rapidly. Use of social networks varies from the prediction of customer behavior Goel and Goldstein (2013) to understanding the sms wormhole propagation Xiao et al. (2017). One of the important phenomena of social networks is the information diffusion and this means that if a user has some information then he or she tends to share it with his or her neighbors. Thus information propagates from one part of the network to the other. This phenomenon has been exploited by the E-Commerce houses and found potential applications in viral marketing Chen et al. (2010), computational advertisement Huh (2017), personalized recommendation Zhang et al. (2013), finding influential twitters Riquelme and González-Cantergiani (2016), feed ranking Bonchi et al. (2013) and so on. Due to different commercial applications of social networks, the last one and half decades have witnessed a significant interest in mining and analyzing social networks. Look into Aggarwal and Subbian (2014) and Al-Garadi et al. (2018) for recent surveys.

Based on the design methodology, social networks are of two types: explicit social networks (e.g. Twitter, Facebook, etc.) where users choose their friends by themselves and implicit social networks Losup et al. (2014) (e.g. Epinions, Flixter, etc.) where people are connected based on their common interest; i.e., two users are linked if they have rated (or liked or searched) similar items. For different commercial applications of social networks such as viral marketing Domingos (2005), computational advertisement, item recommendation Yang et al. (2013) prior knowledge of the user’s on-line behavior is important. For performing these commercial activities sometimes the implicit social network is preferred over explicit one due to the following two reasons. Firstly, in implicit social networks, users are connected based on similar item preferences. Hence, a neighbor’s preference can be exploited to predict the preferences of a user with the unknown identity. Secondly, the network is designed and maintained by the E-Commerce house itself, hence it is completely accessible to them Hill et al. (2005). So, it is an important question, how to design the implicit social network in a given context. It is surprising to see that the literature in this direction is very limited. To the best of the author’s knowledge, other than the Podobnik and Lovrek (2015) and van de Bovenkamp et al. (2014) there does not exist any study that deals with this problem. From the E-Commerce house perspective, one viable data where interactions between users and items are recorded is the user-item rating data. In this paper, we initiate the study of the designing implicit user network 111As, in this study, we are concerned with the designing the implicit social network, where the customers of an E-Commerce house are the users of the network, hence in the rest of the paper we use the two terms: ‘implicit user network’ and ‘implicit social network’ interchangeably. from rating data.

Knowledge regarding the structure of the implicit user network is important for many commercial applications. Think of a situation when an E-Commerce house does viral marketing for its newly launched product. For this purpose, they distribute a number of sample items to influential users with a hope that a significant number of them will be likeing it and start sharing the message among their friends in the network. This diffusion phenomenon will be continued and at the end, majority of the users will come to know about the item. The key issue that comes out in the described context is that which users should be chosen initially for initiating the information diffusion. This problem is popularly known as the influence maximization problem and the users who initiate the diffusion process is called as the ‘seed users’ or ‘seed nodes’ Banerjee et al. (2020). Now, it is important to observe that the influence of a seed user can not go beyond the connected component in which it belongs. So, it is important during the seed set selection for the influence maximization process, the component information of the implicit user network should be exploited. Hence, from the described context, not only the designing but also connectivity checking is an important problem. In this paper, along with the designing of implicit user network, we also study the problem of designing as well as connectivity checking of this network. Particularly, we make the following contributions in this paper:

  • We propose the problem of designing and connectivity checking of implicit user network from the user-item rating data.

  • For the Implicit User Network Design Problem, we propose three different approaches, namely, exhaustive search approach, clique addition approach , and matrix multiplication-based approach.

  • For the Implicit User Network Design With Connectivity Checking Problem, we propose two approaches. First one is the sequential approach: designing and then connectivity checking, and the other one is a concurrent approach: an incremental algorithm, which does the designing and connectivity checking simultaneously.

  • All the algorithms presented in this paper has been analyzed to understand their time and space requirement.

  • Proposed algorithms have been implemented with three publicly available user-item rating datasets and an extensive set of experiments have been conducted to understand the efficiency of the algorithms. To investigate the scalability issues of the algorithms they are executed on increasing the input data size.

The remaining portion of this article has been arranged in the following way: Section 2 contains some relevant studies from the literature. Section 3 describes some preliminary concepts and define both the problems formally. Proposed methodologies for both the problems have been described in Section 4. In Section 5, proposed methodologies have been evaluated, and finally, Section 6 concludes this study and provides future research directions.

2 Related Work

In this section, some relevant works from the existing literature have been described. This section is broadly divided into two parts. In the first part, we report literature related to the design and analysis of implicit social networks, whereas in the second part we do the same for different applications of the implicit social network.

2.1 Design and Analysis of Implicit Social Network

Gupte and Eliassi-Rad (2012) proposed an axiomatic framework for measuring the connectedness and tie strength of an implicit social network. Their methodology is also helpful for inferring implicit relation among a set of people by tie strength. Li et al. (2017) proposed a multi-task low-rank linear influence model for detecting influential nodes from an implicit social network. Losup et al. (2014) proposed a methodology for designing an implicit social network from real-world data collected from three different game genres which will be beneficial to both players as well as game operators. Podobnik and Lovrek (2015) showed that implicit social network designed from their on-line behavior actually able to predict hidden relationship among them. Taheri et al. (2017) proposed a methodology for extracting implicit social relationship based on rating prediction using the concept Hellinger Distance. They have performed social recommendation on this network and their experimental results show that use of implicit user relation in social recommendation methods generate almost identical preferences as explicit trust values. Zhang et al. (2014) proposed a methodology to design a implicit brand network from the dataset consisting of historical activities of users on a social media platform. Their experiments answer many interesting research questions about the topology of the brand network, number of users in an influential brand etc. Song et al. (2010) proposed a noble methodology for extracting hidden implicit social relationship from messaging cascade. Nauerz and Groh (2008) proposed a methodology for deriving an implicit social network among the users of a web portal and they have shown that this network can enhance interaction and collaboration in a community.

2.2 Applications of Implicit Social Network

As mentioned in the literature, implicit social networks are found to be useful in designing and improving recommender systems, designing social markets, link prediction in social networks, and so on. Reafee et al. (2016) showed that the implicit social network data can be used to improve the recommendation accuracy of social recommender systems. Lin et al. (2014) proposed a novel Personalized News Recommendation framework using implicit social experts. Their proposed methodology provides better recommendation accuracy specifically for cold-start users. Tuarob and Tucker (2015) have developed a product feature inference model for mining implicit customer preferences within a large scale social media network. Frey et al. (2011) proposed a noble methodology for designing a social market by combining explicit and implicit social relationship. Roth et al. (2010) proposed an interaction-based metric for measuring the affinity of a particular user of the network to other groups. For creating groups, they have used the user’s implicit social graph. Their result demonstrates the importance of implicit social relationship as well as affinity-based ranking. Ma et al. (2011) proposed a novel probabilistic factor analysis framework which incorporates implicit social relationship for recommendation. After that, there are several works on improving recommendation accuracy using implicit social relationship Lin et al. (2012), Chen et al. (2012). Tasnádi and Berend (2015) proposed a methodology for solving link prediction problem based on the implicit user information from the network. Alsaleh et al. (2011) proposed a hybrid social matching system for recommendation using both user’s both explicit as well as implicit relationship. Their result shows that the accuracy of the matching process increases if the implicit data is considered.

To the best of the author’s knowledge, there does not exist any study that constructs implicit social network from the user-item rating data. In this paper, we study two related problems in this direction.

3 Preliminaries and Problem Definition

In this section, we present some preliminary concepts related to this study and describe the implicit user network design problem, and implicit user network design with connectivity checking problem formally. In our study, all the graphs are simple, finite, and undirected. A graph is symbolized as , where and are the set of vertices and edges of the graph, respectively. For any vertex , we denote its set of neighbors as , and cardinality of the neighborhood is known as degree, i.e., . A pair of vertices and are adjacent to each other if the edge is present in . A graph is said to be bipartite if its vertex set can be partitioned into two parts such that no two vertices of the same part are adjacent to each other. A graph is said to be connected if between every pair of vertices there exists a path. If a graph is not connected then it consists of more than one connected components. Readers require more treatment on basic graph theory please refer to Diestel (2000). Next, we define the user-item rating data.

Definition 1 (User-Item Rating Data)

This is a weighted bipartite graph , where , are the set of users and are the set of items present in the system. if and only if user has rated the item . is the edge weight function that assigns each edge to the corresponding rating value, i.e., .

In our study, we work with user-item rating datasets, where ratings are binary. We denote the number of users and items present in the system by and , respectively. Traditionally, this data can be represented by a bi-adjacency matrix of size , where -th entry is if the user has rated the item and otherwise. However, the real-world rating datasets are represented as a collection of tuples of the form , which means that the user has rated the item with the rating value . As, we are working with binary rating datasets, the entry is missing. It is easy to observe that for any , , and for any , . For any positive integer , let denotes the set . In this paper, as we are dealing with two different graphs222In rest of the paper, the words ‘graph’ and ‘network’ has been used interchangeably., for the ease of clarity, we use the symbol of the graph as subscript for the neighborhood and degree. As an example, for any user , denotes the set of other users with which is directly connected in and denotes the set of items that the user has rated. Next, we define the implicit user network.

Definition 2 (Implicit User Network)

An implicit user network corresponding to a user-item rating data is basically an undirected, unweighted graph , where the vertex set of is the set of users present in and there will be an edge between two users if they have at least one item, which is rated by both of them, i.e., , and for all and , if and only if .

Figure 1 shows an example of a user-item rating data and its corresponding implicit user network.

Figure 1: A toy example of user-item rating data and its implicit user network.

Next, we define both the problems that we have worked out in this paper.

Implicit User Network Design
Input: The user-item rating data .

Problem: Design the Implicit User Network , such that for all , if and only if .

Implicit User Network Design With Connectivity Checking
Input: The user-item rating data .

Problem: Design the Implicit User Network , such that for all , if and only if , and obtain all the connected components of .

Table 1 contains the symbols and notations that are used in this study. Many of them has not been introduced yet. In the next section, the proposed algorithms for both the problems with detailed analysis have been described.

Notation Meaning
User-Item rating data
The set of users in
The set of items in
Number of edges in
The number of users in , i.e.,
The number of items in , i.e.,
The implicit user network among the users in
The set of vertices of , i.e.,
The set of edges of
The number of edges of , i.e.,
The Bi-Adjacency matrix of
-th entry of
Adjacency Matrix of
Transpose of
Number of elements in
Neighborhood of the node in
Degree of the node in , i.e.,
Maximum degree among the nodes in
Exponent for matrix multiplication
is reachable to by a path of length in
Table 1: Notations used in this study

4 Proposed Methodologies

This section is broadly divided into two subsections containing the solution methodologies for the problems with detailed analysis.

4.1 Solution Methodologies for the Implicit User Network Design Problem

Here, we present three different solution methodologies for the implicit user network design problem.

4.1.1 Exhaustive Search Approach

As its name suggests, in this method all the user pairs of exhaustively checks whether there exists a common item in , which is rated by both the users of the pair. If there exists such an item, then the algorithm puts in the corresponding entry of the adjacency matrix of . Algorithm 1 formally describes the procedure for designing the implicit user network from the user-item rating data.

Data: User-Item rating data as Bi-adjacency Matrix ().
Result: Adjacency Matrix () of .
1 ;
2 ;
3 ;
4 for  to  do
5       for  to  do
6             for  to  do
7                   if  then
8                         ;
9                         ;
10                         break;
11                        
12                  else
13                         ;
14                         ;
15                        
16                  
17            
18      
Algorithm 1 Exhaustive Search Approach

Now, we analyze the time and space requirement of Algorithm 1. As there are users in , hence the maximum number of possible user pairs could be . Now, for each of the user pair, we need to check whether there exists a common item or not. Hence, for each user pair time requirement is of . So, the total time requirement is of . Extra space consumed by Algorithm 1 is to store the adjacency matrix of , which is of . Hence, Theorem 1 holds.

Theorem 1

Running time and space requirement of Algorithm 1 is of and , respectively.

4.1.2 Clique Addition Approach

This methodology works based on the principle stated in Lemma 1.

Lemma 1

Let be the user-item rating data, then , will be a clique in .

Proof

It has been mentioned previously, for any bipartite graph , , . Now, for any two users for some , they have always as a common item,and hence, . This holds for every user pairs of . Hence, , of will be a clique in .

As an example, it can be observed from Figure 1 that and this is a clique in . Based on the clique addition approach, the implicit user network can be constructed by the following way. Starting with an empty graph where the users in are the vertices, just add the cliques , . Algorithm 2 performs this task.

Data: User-Item rating data as Bi-adjacency Matrix ().
Result: Adjacency Matrix () of .
1 ;
2 ;
3 ;
4 for  do
5       ;
6       for  do
7             if  then
8                   ;
9                  
10            
11      if  then
12             for  to  do
13                   ;
14                   for  to  do
15                         ;
16                         ;
17                         ;
18                        
19                  
20            
21      
Algorithm 2 Clique Addition Approach

Now, we analyze Algorithm 2 to understand its time and space requirements. Let, be the maximum degree among the vertices of . Hence, the size of each clique in due to each item could be as much as . Starting with an empty graph adding each clique in requires time. As the number of items in the user-item rating data are , hence the running time of Algorithm 2 will be . Additional space consumed by Algorithm 2 is due to storing the users that rate the item (refer to Line No. of Algorithm 2) which takes space and storing the adjacency matrix of , which is of . Hence, the total space requirement of Algorithm 2 is of . Hence, Theorem 2 holds.

Theorem 2

Running time and space requirement of Algorithm 2 is of and , respectively.

Now, it is important to observe that in the worst case could be . If for all , , then the running time of Algorithm 2 will be , which is no better than that of Algorithm 1. Practically this will be the case when all items are popular items (i.e., rated by many users). However, in reality, rating data are extremely sparse Grčar et al. (2005). This means there will be very few items that are rated by many users and the majority of the items are rated by only a few users. In this situation, Algorithm 2 should take less computational time compared to Algorithm 1 and this is exactly what we have observed in our experimentation described in Section 5.

4.1.3 Matrix Multiplication Method

The intuition behind this method is that if be the adjacency matrix of any undirected, unweighted graph then the -th cell of denotes the length paths between the vertex and in that graph. Lemma 2 describes the fact in this problem context.

Lemma 2

Let be a user-item rating data and be the designed implicit user network. Now, given any pair of users and of , if and only if, they have at least one 2 length path in . Mathematically,

Proof

As this is an ‘if and only if’ statement, we have to prove both the directions. First, we prove the forward direction. Assume that there exists an edge between and in . This essentially means that there exists minimum one common item in . Without loss of generality, assume that the common item is , . Hence, both the edges and will be present in . This clearly implies that and are reachable using the path and the length of this path is two. This necessarily shows that if then this implies that and are connected by minimum one path of length in .

For the reverse direction, assume that there exists a length path between and in . As is bipartite, hence there must exist a vertex in such that is a path of length . This clearly implies that . Hence by definition of implicit social network . This completes the proof.

Now, we report another interesting observation in Lemma 3, which relates with .

Lemma 3

Let be the user-item rating data and be its bi-adjacency matrix. denotes the -th entry of . Then , .

Proof

It has been mentioned before that given a bipartite graph represented as bi-adjacency matrix the -th entry of signifies the number of 2 length paths between the vertices and . Hence, in case of -th entry, this is basically the number of two length paths starting and ending at . Now for any vertex if we make traversal of length from to itself, then one edge incident to will be traversed two times, and the path stars from then goes to some then again come back to . This implies that such 2 length traversal possible is equal to the number of edges incident on and this is same as the degree of . Hence, the following relation holds: , . This completes the proof.

Data: User-Item rating data as Bi-adjacency Matrix ().
Result: Adjacency Matrix () of .
1 ;
2 ;
3 ;
4 ;
5 for  do
6       for  do
7             ;
8            
9      
10;
11 for  to  do
12       for  to  do
13             if  then
14                   ;
15                  
16            
17      
18for  to  do
19       for  to  do
20             if  then
21                   ;
22                  
23            
24      
Algorithm 3 Matrix Multiplication-Based Approach

Algorithm 3 designs the implicit user network based on the matrix multiplication-based approach, whose working principle is as follows. For the given user-item rating data as a bi-adjacency matrix, first, it computes its transpose (Line No. to ). The time requirement for this step is of . Next, it performs the matrix multiplication between and (Line No. ). Complexity issues of this step is discussed little later. Let, be the obtained matrix, which is of size . Finally, we change the principal diagonal elements of to , and the other elements which are greater than to (Line No. to , and to , respectively). Computational time requirement of this step is of . If, the naive matrix multiplication technique is applied to multiply (with dimension ) and (with dimension ), then the computational time requirement of this step will be of . In that case, the running time of this algorithm will be of , which is no better than Algorithm 1. However, there exist faster rectangular matrix multiplication Algorithms Bläser (2013). One of them is due to Le Gall (2012) and it has been shown that two rectangular matrices of size and with can be multiplied with . This is represented as and referred to as the matrix multiplication exponent, where Chiantini et al. (2018). Hence, the running time of Algorithm 3 is . Additional space consumed by Algorithm 3 is due to storing the , which requires space, and , which requires space. Hence, the total space required by Algorithm 3 is . Hence, Theorem 3 holds.

Theorem 3

Algorithm 3 can be implemented with time, and time and space requirement respectively, where is the exponent for the rectangular matrix multiplication.

The following example demonstrates the working principle of Algorithm 3. If, we multiply the bi-adjacency matrix () of with its transpose () we have the following

  

As an example, it can be verified from Figure 1, that degree of , , are , and , have degree in . Now, putting 0 to all the principal diagonal entries and all other entries that are greater than one to one of we have

Now, it can be easily verified that the matrix is same as the adjacency matrix () of the implicit user network that has been shown in Figure 1.

4.2 Implicit User Network Design with Connectivity Checking

As mentioned previously the connectivity information among the users of the implicit user network is important for different commercial applications by the E-Commerce house which includes viral marketing, computational advertisement, and so on. Here, we address this issue by the following two methods.

4.2.1 Method 1 (Sequential Approach: Designing and then Connectivity Checking)

Algorithm 4 describes the easiest approach for solving the designing and connectivity checking problem. In this approach, first using any one of the three algorithms presented in Section 4.1 the implicit user network is designed and subsequently the breadth first search is run on the designed network to obtain its connected components.

Data: Bi-adjacency Matrix () of .
Result: Adjacency Matrix () of and , where , is a connected component.
1 Step 1: Apply any one of Algorithm 1 or 2 or 3 to design the network.
Step 2: Run Breadth First Search (BFS) Algorithm for finding the connected components.
Algorithm 4 Sequential approach for designing and connectivity checking of implicit social network

Before analyzing Algorithm 4, we first state and prove the following lemma.

Lemma 4

Even if the user-item rating data is sparse, the implicit user network may be dense.

Proof

Assume that is a user-item rating data with , and . Now, is sparse if . It is trivial that for any , . Assume that in , constant number (say ) of items have their degree and remaining number of items have degree . Now, the number of edges of will be equal to the sum of the degrees of the items, i.e., . Now, the sum of the degrees of the items can be given by the following equation:

(1)

This shows that if a few number of items have their degree as and reaming items have degree then it leads to a sparse user-item rating data. Now, pick any item having neighboring users in . As per Lemma 1, this item will induce a clique of size in . A clique of vertices will have edges. This means the implicit user network will also have edges. This means that the implicit social network is dense. Hence, even sparse user-item rating data may also lead to a dense implicit user network. This completes the proof.

Now, we analyze Algorithm 4 for its time and space requirement. Let, be the number of edges present in the implicit social network. Hence, performing BFS on requires time. As shown in Lemma 4, even for sparse user-item rating data, . Hence, the time requirement for performing BFS is of . As simple implementation of BFS requires linear space, hence additional space requirement for performing the BFS is . It is natural that the running time of Algorithm 4 will depend upon which algorithm is used for designing the implicit user network. If we use Algorithm 2 for designing the network the time and space requirement by Algorithm 4 will be , and , respectively. Hence, Theorem 4 holds.

Theorem 4

Designing and connectivity checking of the implicit user network can be done in time and space.

However, we can do the designing and connectivity checking of the implicit user network at the same time and it is much beneficial in terms of computational time. We describe this method in the following section.

4.2.2 Method 2 (Concurrent Approach: Designing and Connectivity Checking Simultaneously)

For a given user-item rating data, Algorithm 5 performs designing and connectivity checking of the implicit user network simultaneously. As we observe in the experimentation, this method is much more efficient than Algorithm 4. Here, we describe the working principle of Algorithm 5. Line to are mostly initialization statements where we create the adjacency matrix () of , a boolean array of length and initialized both of them to . means that the clique has been added into the implicit user network. Rest part of the algorithm works as follows. If the entire user set has not been exhausted yet, then start a new component and randomly pick a user from the remaining set of users. Let, the randomly chosen user be . Next step is to find out the neighbor(s) of in . After that, for every item in , the following steps are performed.

  • Pick an item from the list, and if its corresponding entry in is (which means the clique consisting of the vertices of its neighborhood in has not been added) then invoke the function. This function performs the following task. If the neighborhood size is , then it just returns, else for every pair of vertices of the neighborhood, it puts in the corresponding entries of .

  • Once the clique is added, the corresponding entry in the vector is set to .

  • Those vertices of the clique which are not in the current connected component, are included in it and excluded from the current set of users.

  • Now, take all the neighborhood users of the item and then for each one of these users pick their neighborhood items in . Check their entry in the vector. If it is then put the item in the list .

These steps are carried out until the list becomes empty. This ends the description of Algorithm 5. Next, we report a few important observations, which will help us to argue the correctness, and also the running time of this algorithm.

Data: Bi-adjacency Matrix () of .
Result: Adjacency matrix of and the components of .
;
  // The Set of Users
  ;
  // The Set of Items
  ;
  // Number of Users
 ;
  // Number of Items
1 ;
;
  // Adjacency Matrix of User Network
2 ;
3 ;
4 while  do
5       ;
6       ;
7       ;
8       ;
9       for  do
10             if  then
11                   ;
12                   ;
13                   for  do
14                         if  then
15                               ;
16                              
17                        
18                  ;
19                   for  do
20                         for  do
21                               if  then
22                                     ;
23                                    
24                              
25                        
26                  
27            
28      
29
30 if  then
31       return;
32      
33else
34       for  do
35             ;
36             ;
37            
38      
Algorithm 5 Concurrent approach for designing and connectivity checking problem.
Observation 1

In Algorithm 5, number of times while loop in Line No. will execute is same as the number of connected components of the implicit user network.

Proof

We prove this statement by analyzing the control flow of the Algorithm 5. Assume that in the first run of the while loop, at Line No. the user is chosen. After that, an item (say ) is picked randomly from , and the clique consisting of the users of the neighborhood of , i.e., is added. If the nodes of the clique are not in the current component then they are included into it. Also, all the neighbor items of the users in are added to the list . Now, for any , it is important to observe that the . Hence, and are connected. Now, applying this argument iteratively, the subgraph induced by the cliques corresponding to the items in will be connected. The list becomes empty when the connected component that was currently built is finished. So, once a user is chosen randomly from it first finishes the construction of the entire connected component in which the randomly chosen user belongs and next the algorithm chooses another user from the remaining set of users uniformly at random to construct another connected component of the implicit user network. Hence, the number of times user will be chosen is the same as the number of connected components. This implies that the number of times the while loop executes will be the same as the number of connected components of the implicit social network. This proves the statement.

Observation 2

The function of Algorithm 5 will be invoked just once for every .

Proof

In Observation 1, it has already been shown that once a user is chosen randomly at Line No. , the entire connected component is built without picking any further user randomly. So, it is sufficient to show that even in one single run of the while loop, for all the items linked with the users of the connected component which is currently being built, the function is called only once. As soon as the clique corresponding to the item is added, its flag in the array is set to . Also, in Line No. an item with its flag has not been added into the list . Hence, for every item the function is invoked just once. This completes the proof.

Now, we analyze Algorithm 5 to understand its time and space requirements. From Line No. to all are initialization statement, and hence takes time. It has been shown in Observation 1 that the number of times the while loop of Line No. will run is the same as the number of connected components the implicit social network has. Theoretically, the number of connected components of a vertex network is of . However, from practical evidence, every social network contains a giant component that contains a significant fraction of nodes. As an example, in the Twitter follow graph Myers et al. (2014), the largest connected component contains of the active users. Hence, in practice the number of connected components is constant. In our analysis also, we assume that the number of connected components of the implicit social network will be constant. In turn, it implies that the while loop will also run for time. Inside the loop execution of the statements from Line No. to requires time. Now, for the randomly chosen user (say, ) computing requires time. Hence, the for loop at Line No. will execute times. It has been shown in Observation 2 that for every item , the function will be invoked just once. Now, the running time of this function will depend on the size of the clique. Now, assume that the maximum degree among all the items is of . Hence, the running time of the function will be of . After that, setting the flag to true of the item for which the clique has been added in the implicit user network at Line No. will require time. For any , could be of at most . Hence, the number of times the for loop in Line No. will run is of . Now, the size of a component could be as big as . Hence, the condition checking of the if statement at Line No. will require time. Hence, the running time of the for loop from Line No. to requires time. Now, performing the ‘set minus’ operation at Line No. requires time. It is quite easy to observe that the for loops at Line No. and can execute at most and times, respectively. After that, condition checking for the if statement at Line No. and adding the item in the List at Line No. requires time. Now, we unwrap the time requirement of Algorithm 5 from bottom to top. Time requirement from Line No. to requires time. Hence, the time requirement from Line No. to is of . This implies that the time requirement from Line No. to is of . Time requirement from Line No. to requires . Now, as , hence . As, the while loop runs for a constant time, hence the time requirement from Line No. to is of . As other statements of Algorithm 5 requires time, hence the time requirement of this algorithm is of . Additional space consumed by Algorithm 5 is due to storing , which consumes space; , which requires space, and storing all the components together requires space. Hence, the total space requirement of Algorithm 5 is of . Hence, the following theorem holds.

Theorem 5

The time and space requirement of Algorithm 5 is of and , respectively.

5 Experimental Evaluation

In this section, we perform an extensive set of experiments for evaluating the proposed methodologies. Initially, we start with a brief description of the datasets.

5.1 Description of the Datasets

In our experiments, we have used the following user-item rating dataset. All of them are collected from Koblenz Network Collection (KONECT) 333 http://konect.uni-koblenz.de/.

  • Filmtrust Guo et al. (2013): This is a bipartite rating network between users and movies. An undirected edge between user and item denotes the user has rated the item. Edge weight represents rating value based on a particular rating scale.

  • MovieLens 444http://konect.uni-koblenz.de/networks/movielens-1m: This user-item rating network contains one million movie ratings from http://movielens.umn.edu/. Left nodes are users and right nodes are movies. An edge between a user and a movie shows that the user has rated the movie.

  • Epinions 555http://konect.uni-koblenz.de/networks/epinions-rating Massa and Avesani (2005): This is the bipartite rating network of Epinions, an on-line product rating site. Each edge connects a user with a product and represents a rating as edge weight.

As our study is concerned with binary ratings, hence, for all the datasets, we do re-scaling of the rating values in scale. Table 2 contains the basic statistics of the datasets. As mentioned previously, from the density values presented in Table 2 one can convince himself that the rating datasets are extremely sparse.

Dataset Name # Users # Items # Ratings Density
Filmtrust 1508 2071 35497 0.011366
MovieLens 9746 6040 1000209 0.01699
Epinions 40163 139738 664824 0.000118
Table 2: Basic Statistics of the Datasets

5.2 Experimental Setup

As there is no prior work on the designing implicit user network from the user-item rating data, we can not compare the performance of the methods with any existing methods. Instead, we do a comparative study among the proposed methodologies itself. All the proposed algorithms have been implemented on Python 2.7 with NetworkX 1.9.1 Package. All the experiments have been carried out using a 5 node high performance computing cluster each of them has 32 cores and 64 GB of RAM.

As our goal is to make a comparative study regarding computational time and scalability of the proposed algorithms, for each of the datasets, we start with number of ratings of the original dataset from the top, and subsequently add more, and continued until the whole dataset is exhausted. As the exhaustive search method is taking huge computational time, we do not report results for this method on larger datasets (i.e., other than the ‘Filmtrust’).

5.3 Experimental Results with Observation

Here, we describe our obtained results and list out the key observations. We start with reporting the results for the implicit user network design problem.

(a) Flimtrust Dataset
(b) Movielens Dataset
Figure 2: Portion of the dataset vs. computational time plots for the implicit user network design problem on different datasets. Here, Algorithm 1, 2, and 3 denotes the ‘Exhaustive Search Approach’, ‘Clique Addition Approach’, and ‘Matrix Multiplication-based Approach’, respectively.

Figure 2 shows the portion of the dataset versus computational time plots for the implicit user network design problem. From Figure 2a, it can be clearly observed that the time requirement for the ‘exhaustive search method’ is very very high compared to the both ‘clique addition approach’ and ‘matrix multiplication-based’ approach. As an example, in case of ‘Filmtrust’ dataset, when only top of the total number of ratings have been used, the computational time requirement for the implicit user network design for the exhaustive search approach is Secs. However, the same for the ‘clique addition approach’ and the ‘matrix multiplication-based approach’ are Secs and Secs, respectively. The key observations are as follows:

  • Among the proposed approaches, the ‘exhaustive search approach’ takes the maximum computational time, as for every pair of users this method searches the entire item set to check for the existence of a common item.

  • Among the remaining two methods, from the experiments, it has been observed that the ‘clique addition approach’ is much faster than the other method. As an example, when the whole ‘Epinion’ dataset has been used, the time requirement to construct the implicit social network by the ‘clique addition approach’ and ‘matrix multiplication-based approach’ are seconds and seconds, respectively.

So, it can be concluded that the ‘clique addition approach’ is the superior one and should be used to construct the implicit user network, particularly when the rating dataset is sparse. Next, we proceed to report the experimental results related to the ‘implicit user network design with connectivity checking’ problem.

(a) Flimtrust Dataset
(b) Movielens Dataset
Figure 3: Portion of the dataset vs. computational time plots for the implicit user network design and connectivity checking problem on different datasets. Here, Algorithm 4, and 5 refers to the Sequential Approach: First Design and then Connectivity Checking, and Concurrent Approach: Design and Connectivity Checking simultaneously.

Figure 3 shows the portion of the ratings used vs. computational time plots for all the three datasets. From this figure, it can be observed that across all the datasets, the concurrent approach: designing and connectivity checking together (i.e., Algorithm 5) is much more efficient compared to the sequential approach: design and then connectivity checking method (Algorithm 4). As an example, when the entire ‘MovieLens’ dataset has been used, the computational time requirement for Algorithm 5 and 4 is and , respectively. The reason behind this is as follows: at the time of designing, the edges of the implicit user network are traversed, and also, at the time of connectivity checking the same traversal is happening once more which is redundant. As in the second method, we are cleverly maintaining the connectivity information during the designing itself, this saves the computational time.

6 Conclusion

In this paper, we have introduced two related problems regarding the designing and connectivity checking of the implicit user network from the user-item rating data. For the implicit user network design problem, we have proposed three different approaches, namely exhaustive search approach, clique addition approach, and matrix multiplication-based approach. For the implicit user network design with connectivity checking problem, we have proposed two different approaches. The first one is the sequential approach: designing and then connectivity checking, and the other one is a concurrent approach: an incremental algorithm, which does the designing and connectivity checking simultaneously. Experimentation with real-world user-item rating datasets show that for the first problem the ‘clique addition approach’ performs better than the rest of the two approaches since the datasets are extremely sparse. For the second problem, it is observed that the concurrent approach takes less computational time. Now, it will be an interesting future work to use the implicit social network information for social recommendation, seed set selection for viral marketing and study its performance.

References

  • C. Aggarwal and K. Subbian (2014) Evolutionary network analysis: a survey. ACM Computing Surveys (CSUR) 47 (1), pp. 1–36. Cited by: §1.
  • M. A. Al-Garadi, K. D. Varathan, S. D. Ravana, E. Ahmed, G. Mujtaba, M. U. S. Khan, and S. U. Khan (2018) Analysis of online social network connections for identification of influential users: survey and open research issues. ACM Computing Surveys (CSUR) 51 (1), pp. 1–37. Cited by: §1.
  • S. Alsaleh, R. Nayak, Y. Xu, and L. Chen (2011) Improving matching process in social network using implicit and explicit user information. Web Technologies and Applications, pp. 313–320. Cited by: §2.2.
  • S. Banerjee, M. Jenamani, and D. K. Pratihar (2017) Algorithms for projecting a bipartite network. In 2017 Tenth International Conference on Contemporary Computing (IC3), pp. 1–3. Cited by: Designing and Connectivity Checking of Implicit Social Networks from the User-Item Rating Datathanks: The author is supported by the Post Doctoral Fellowship Grant provided by Indian Institute of Technology Gandhinagar (Project No. MIS/IITGN/PD-SCH/201415/006). A small part of this study has been previously published as Banerjee et al. (2017)..
  • S. Banerjee, M. Jenamani, and D. K. Pratihar (2020) A survey on influence maximization in a social network. Knowl Inf Syst. External Links: Document Cited by: §1.
  • M. Bläser (2013) Fast matrix multiplication.. Theory of Computing, Graduate Surveys 5, pp. 1–60. Cited by: §4.1.3.
  • F. Bonchi, C. Castillo, and D. Ienco (2013) The meme ranking problem: maximizing microblogging virality. Journal of Intelligent Information Systems 40 (2), pp. 211–239. Cited by: §1.
  • C. Chen, C. Mao, Y. Tang, G. Chen, and J. Zheng (2012) Personalized recommendation based on implicit social network of researchers. In Joint International Conference on Pervasive Computing and the Networked World, pp. 97–107. Cited by: §2.2.
  • W. Chen, C. Wang, and Y. Wang (2010) Scalable influence maximization for prevalent viral marketing in large-scale social networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1029–1038. Cited by: §1.
  • L. Chiantini, J. D. Hauenstein, C. Ikenmeyer, J. M. Landsberg, and G. Ottaviani (2018) Polynomials and the exponent of matrix multiplication. Bulletin of the London Mathematical Society 50 (3), pp. 369–389. Cited by: §4.1.3.
  • R. Diestel (2000) Graph theory graduate texts in mathematics; 173. Springer-Verlag Berlin and Heidelberg GmbH & amp. Cited by: §3.
  • P. Domingos (2005) Mining social networks for viral marketing. IEEE Intelligent Systems 20 (1), pp. 80–82. Cited by: §1.
  • D. Frey, A. Jégou, and A. Kermarrec (2011) Social market: combining explicit and implicit social networks. In Symposium on Self-Stabilizing Systems, pp. 193–207. Cited by: §2.2.
  • S. Goel and D. G. Goldstein (2013) Predicting individual behavior with social networks. Marketing Science 33 (1), pp. 82–93. Cited by: §1.
  • M. Grčar, D. Mladenič, B. Fortuna, and M. Grobelnik (2005) Data sparsity issues in the collaborative filtering framework. In International Workshop on Knowledge Discovery on the Web, pp. 58–76. Cited by: §4.1.2.
  • G. Guo, J. Zhang, and N. Yorke-Smith (2013) A novel bayesian similarity measure for recommender systems. In

    Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI)

    ,
    pp. 2619–2625. Cited by: 1st item.
  • M. Gupte and T. Eliassi-Rad (2012) Measuring tie strength in implicit social networks. In Proceedings of the 4th Annual ACM Web Science Conference, pp. 109–118. Cited by: §2.1.
  • S. Hill, F. Provost, and C. Volinsky (2005) Viral marketing: identifying likely adopters via consumer networks. Cited by: §1.
  • J. Huh (2017) Considerations for application of computational social science research approaches to digital advertising research. New York: Routledge. Cited by: §1.
  • F. Le Gall (2012) Faster algorithms for rectangular matrix multiplication. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on, pp. 514–523. Cited by: §4.1.3.
  • Q. Li, B. Kailkhura, J. Thiagarajan, Z. Zhang, and P. Varshney (2017) Influential node detection in implicit social networks using multi-task gaussian copula models. In NIPS 2016 Time Series Workshop, pp. 27–37. Cited by: §2.1.
  • C. Lin, R. Xie, X. Guan, L. Li, and T. Li (2014) Personalized news recommendation via implicit social experts. Information Sciences 254, pp. 1–18. Cited by: §2.2.
  • C. Lin, R. Xie, L. Li, Z. Huang, and T. Li (2012) Premise: personalized news recommendation via implicit social experts. In Proceedings of the 21st ACM international conference on Information and knowledge management, pp. 1607–1611. Cited by: §2.2.
  • A. Losup, R. Van De Bovenkamp, S. Shen, A. L. Jia, and F. Kuipers (2014) Analyzing implicit social networks in multiplayer online games. IEEE Internet Computing 18 (3), pp. 36–44. Cited by: §1, §2.1.
  • H. Ma, I. King, and M. R. Lyu (2011) Learning to recommend with explicit and implicit social relations. ACM Transactions on Intelligent Systems and Technology (TIST) 2 (3), pp. 29. Cited by: §2.2.
  • P. Massa and P. Avesani (2005) Controversial users demand local trust metrics: an experimental study on epinions. com community. In AAAI, Vol. 5, pp. 121–126. Cited by: 3rd item.
  • S. A. Myers, A. Sharma, P. Gupta, and J. Lin (2014) Information network or social network? the structure of the twitter follow graph. In Proceedings of the 23rd International Conference on World Wide Web, pp. 493–498. Cited by: §4.2.2.
  • A. Nauerz and G. Groh (2008) Implicit social network construction and expert user determination in web portals.. In AAAI Spring Symposium: Social Information Processing, pp. 60–65. Cited by: §2.1.
  • V. Podobnik and I. Lovrek (2015) Implicit social networking: discovery of hidden relationships, roles and communities among consumers. Procedia Computer Science 60, pp. 583–592. Cited by: §1, §2.1.
  • W. Reafee, N. Salim, and A. Khan (2016) The power of implicit social relation in rating prediction of social recommender systems. PloS one 11 (5), pp. e0154848. Cited by: §2.2.
  • F. Riquelme and P. González-Cantergiani (2016) Measuring user influence on twitter: a survey. Information processing & management 52 (5), pp. 949–975. Cited by: §1.
  • M. Roth, A. Ben-David, D. Deutscher, G. Flysher, I. Horn, A. Leichtberg, N. Leiser, Y. Matias, and R. Merom (2010) Suggesting friends using the implicit social graph. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 233–242. Cited by: §2.2.
  • M. Song, W. Lee, and J. Kim (2010) Extraction and visualization of implicit social relations on social networking services. In Twenty-Fourth AAAI Conference on Artificial Intelligence, Cited by: §2.1.
  • S. M. Taheri, H. Mahyar, M. Firouzi, E. Ghalebi K., R. Grosu, and A. Movaghar (2017) Extracting implicit social relation for social recommendation techniques in user rating prediction. In Proceedings of the 26th International Conference on World Wide Web Companion, WWW ’17 Companion, Republic and Canton of Geneva, Switzerland, pp. 1343–1351. External Links: ISBN 978-1-4503-4914-7, Link, Document Cited by: §2.1.
  • E. Tasnádi and G. Berend (2015) Supervised prediction of social network links using implicit sources of information. In Proceedings of the 24th International Conference on World Wide Web, pp. 1117–1122. Cited by: §2.2.
  • S. Tuarob and C. S. Tucker (2015) A product feature inference model for mining implicit customer preferences within large scale social media networks. ASME IDETC/CIE 15. Cited by: §2.2.
  • R. van de Bovenkamp, S. Shen, A. L. Jia, F. Kuipers, et al. (2014) Analyzing implicit social networks in multiplayer online games. IEEE Internet Computing 18 (3), pp. 36–44. Cited by: §1.
  • S. Wasserman, K. Faust, et al. (1994) Social network analysis: methods and applications. Vol. 8, Cambridge university press. Cited by: §1.
  • X. Xiao, P. Fu, Q. Li, G. Hu, and Y. Jiang (2017) Modeling and validation of sms worm propagation over social networks. Journal of Computational Science. Cited by: §1.
  • X. Yang, Y. Guo, and Y. Liu (2013) Bayesian-inference-based recommendation in online social networks. IEEE Transactions on Parallel and Distributed Systems 24 (4), pp. 642–651. Cited by: §1.
  • J. Zhang, Y. Wang, and J. Vassileva (2013) SocConnect: a personalized social network aggregator and recommender. Information Processing & Management 49 (3), pp. 721–737. Cited by: §1.
  • K. Zhang, S. Bhattacharyya, and S. Ram (2014) Empirical analysis of implicit brand networks on social media. In Proceedings of the 25th ACM conference on Hypertext and social media, pp. 190–199. Cited by: §2.1.