The rapid growth in the use of smart mobile devices over the last decade has resulted in an explosive increase in mobile data traffic. The major contributing factor is the increasing number of one-to-many transmissions in the form of multimedia messaging to groups and content posted with fans over social networks (Bastug et al., 2014; Shanmugam et al., 2013; Yang et al., 2018). The resulting redundant and repetitive transmission of content (Zhang and Zhu, 2018) stresses the cellular backhaul.
Proactively caching content at the network edge and locally serving repeat requests will alleviate the wireless backhaul resource stress and also reduce user-perceived latency. Traditional web caching systems do not work as well in modern-day traffic where an enormous amount of content is created, shared, and forwarded by a large number of users. Online Social networks are often organized in the form of overlapping interest groups followers of Facebook pages, Twitter users, and members of Instagram circles. These groups or communities are characterized by their similar interests (usually requesting the same contents).
This paper proposes Bingo, a social community-based edge caching scheme. Community structure in the (logical) social network is not known to the MNO, Bingoestimates the groups from the user-request log. Bingo leverages this social locality information (contents shared with social groups) to proactively cache content at the network edge (e.g. base station). Thus, multiple members of a single social group receive the content of their mutual interest from the respective edge nodes. Bingo outperforms traditional caching schemes in terms of the cache hit ratios, thus resulting in (i) reduced usage of backhaul bandwidth, and (ii) smaller latency for end-users.
A practical feature of Bingo is its flexibility to be deployed incrementally at multiple levels of regional granularity, as per the business priorities of the mobile operators. An operator may choose to deploy Bingo at a single base station (e.g., a cell containing in Figure 1). This will allow content caching at the base station. For instance, if the content shared with community is cached at the , it will be served locally from this edge cache to the six members of community served by .
Alternately, Bingo may be deployed in the cellular packet core of a provider, allowing content caching in the core, bringing caching benefit to the members of a social community split into multiple base stations. In Figure 1, if the content destined for community is cached at the cellular packet core, then all 18 members of community across the three cells will benefit from this locality of reference. Bingo also supports a cache hierarchy by simultaneous deployment at select base stations, the packet core, and a remote data center. With hierarchical deployment, the decision engine in Bingo will cache a piece of content only at those edge nodes which may serve multiple members of the social community interested in that content. Many known caching schemes take into account the overlaid logical social network to decide What to cache and Where to cache. There also exist collaborative caching schemes using device-to-device (D2D) communication that cache content at users’ devices not just for themselves but also for their “social friends” in a tit for tat manner. These schemes however require complete knowledge of the social graph, focus on one-to-one interactions rather than content sharing among circles of friends, and/or do not adapt to quick content popularity changes.
Bingo addresses all the above concerns: it does not require knowledge of the social graph; it does not even try to estimate the social graph (respecting the security and privacy of user traffic). Instead, employing existing overlapping community detection techniques on the weighted user network modeled from the request log, Bingo approximates the community structure in the overlaid social network. For making a caching decision, Bingo considers local (instead of global) popularity of content—by measuring the interest of relevant communities. Finally, Bingo also incorporates other features of social traffic such as geographic locality (content shared among users that are geographically close by) and temporal locality (recent content is more popular) in caching decisions.
We empirically evaluate Bingo in an extensive set of experiments with varying community structures, a wide range of network densities, and file popularity distributions. All parameters used for generating synthetic test data are set to values observed in real-life data and as used in the literature. Our empirical evaluation demonstrates that Bingo achieves up to gain over known edge caching schemes in terms of cache hit-ratio.
Altogether, this paper makes the following contributions:
Methodology to estimate the (evolving) social communities based on past user requests without additional information
An open-source prototype implementation ofBingo (including community detection, community identification, and caching decision engine) 111https://github.com/NimrahMustafa/SocialCommunityBasedCaching.git
A thorough evaluation of Bingo using synthetic social groups on a broad range of parameters
2. Related Work
Content caching has evolved as an integral part of wireless networks. Edge caching in wireless networks employs data and social network analysis and machine learning methods for estimating content popularity(Bernardini et al., 2013), determining request frequency (Ong et al., 2014), optimal content placement (Bastug et al., 2014), and collaborative caching (Yang et al., 2018) between end-users by storing content at user devices for later serving on-demand using the D2D communications (Ji et al., 2016; Golrezaei et al., 2014).
The exact location of the cache plays an important role in the effectiveness of a caching scheme. Content can be cached at the base stations (Poularakis et al., 2014b, a), special purpose femtocells (Shanmugam et al., 2013), or in a hierarchical storage system distributed across the mobile network (Tran et al., 2017; Zhang and Zhu, 2018).
There has been recent research interest in leveraging personal and contextual information to determine what content to cache (Müller et al., 2017). Such human-centric information includes visited locations, gender, job, and device type and other user attributes (Ali et al., 2021). User mobility patterns are used in (Zhang et al., 2019)) to pick the best location to serve content from a distributed cache.
Social network information has been utilized to enhance caching performance. In (Bastug et al., 2014) influential nodes are identified as caching locations that will serve their social friends via D2D communications. A more recent approach for D2D caching exploits the knowledge of pairwise social interactions (Weifeng et al., 2020). In (Yang et al., 2018; Zhu et al., 2017) a game-theoretic approach is proposed for mobile users to cache contents, both for themselves and for their friends.
We describe Bingo at the level of a single cell. Owing to the hierarchical and incremental deployability of Bingo, the single-cell model readily extends to caching in the core or a remote datacenter.
Network Model: Consider a base station, connected to the core network with a backhaul link, serving users which are organized into possibly overlapping communities , . Users request files from a library , where each file has an associated popularity . The community structure and are not known at the base station. Cache storage with the capacity to store files is installed at the base station. wlog, we assume that all files have the same size and a file is either stored completely or not at all (Shanmugam et al., 2013).
Key Idea: The guiding principle for caching in Bingo is that if a file is accessed by a member of some community, then almost the entire community would (ultimately) request it. In this regard, users with similar access patterns (requesting the same content within the same time frame) are considered to be a community. This notion of a community is only for the internal working of the estimation algorithm and does not have any implications on the formation of communities, where membership of users is a direct result of social ties, common interests, and event participation.
Process Overview: The continuous stream of requests is divided into chunks each containing consecutive requests. All requests arriving in a chunk are logged and used to model a user network to estimate the community structure for the next chunk . This model incorporates the formation of new and dissolution of old communities, as well as the members joining or leaving existing communities. The chunk size is defined as the number of requests, instead of time elapsed, to cater for traffic variability and ensure that sufficient requests are available to reliably estimate the community structure and can be altered to accommodate the frequency of significant changes in the community structure. With the estimated community structure in place, for each request in , since may belong to many communities, we identify the community as a member of which, has requested after which a caching decision for is made and an existing file in the cache may be removed as per the eviction policy. If cached, remaining requests by users of the community can be served locally.
Community Detection: To estimate the community structure, we first transform the request log into an undirected weighted graph to model a user-user network. The users form the nodes, and an edge between any pair of users is weighted by the number of common files requested by both users. Edges with weight below are dropped. The threshold can be appropriately set to reduce the likelihood of incorrect inclusion or exclusion of a user in a community, i.e. for a given period, its value should be high enough for a low likelihood of two users belonging to different communities requesting at least and low enough for a high likelihood of members of a community requesting at least files. Then, an existing community detection algorithm can be directly run on the graph. We use a simple conductance-based algorithm (Lu et al., 2015) for community detection. Although community detection is computationally expensive, it is sufficient to periodically perform this task in the background during off-peak hours, since these communities evolve rather slowly. Bingo would cater to significant changes occurring in a period on the order of hours.
Community Identification: A minimum of requests by users for a file are accumulated before a community is identified, which must be among the set of common communities to which users who have all requested belong. The optimal value of can be fine-tuned depending on the overlap factor of the communities which, in many real-world social networks, is small, communities can be identified accurately in a short time, i.e. for a small value of . We must account for users who may have individually requested and may not have any common communities with the other requesting users since, otherwise, the likelihood of the intersection of communities to which requesting users belong being empty will be higher. To this end, we sort the requesting users in descending order by the number of communities they belong to. The intersection process begins with the user belonging to the most number of communities. In each step, the intersection of the set of potentially identifiable communities is taken with the set of communities to which the next user belongs. This process stops when some minimum number of potential communities , a configurable parameter, remain from which the largest is selected as the identified community .
Caching Decision: The caching decision for a file is primarily driven by , i.e. the number of users belonging to the identified community. We assign each requested file a score and cache the highest-scoring files. If a community is identified, . Otherwise, . For each user who belongs to and requests after is identified, is decremented. For each cached file , the number of requests elapsed since was last requested is used to incorporate a recency factor in the caching decision to avoid unnecessarily occupation of cache space. If is evicted without having served all its expected requests, the number of served requests is temporarily retained so when is next cached with identified community , is set based on instead of which is a higher score than deserves and would adversely impact caching of more deserving files. The record is deleted once the expected number of requests of have been served. The cache is modeled as a Min-Heap to support the eviction policy, i.e., find and delete the file with the minimum score.
4. Experimental Setup
MSNs have two components: the social network among users and the mobile network of user requests for content. Since the ownership of these two components lies with different entities, the combination of both social and mobile data traces is limited. Due to this lack of real data, we simulate request arrivals at the base station using both synthetic and real-world community structures.
Since it is well-known most social networks (Twitter, Facebook, etc.) are scale-free, we use the affiliation graph model (AGM) (Yang and Leskovec, 2012) to construct the social community structure. We chose the AGM model over other benchmark community generation models such as LFR, which generate user-user networks since a user-community bipartite membership network suffices for our purpose as we do not use the pairwise links between users. We detail the process of realistic request arrival simulation in the source code.
Users request files according to their popularity, which is sampled from the Zipf distribution, which is well known for modeling file popularity. (Bastug et al., 2014). Popularity of a file is defined as , where is the steepness parameter of the popularity curve. A smaller value of implies that fewer files are more popular. is a static set of files (effectively infinite). Parameters for community estimation and identification are set as , , and empirically with no notable deviation from expectation with setting to other values. We evaluate the caching engine using the known community structure to avoid any bias of the community detection algorithm. Since community detection is a widely studied problem in itself, more sophisticated algorithms can be employed to improve performance.
We compare the cache hit ratio, which directly translates to the traffic volume offloaded from the backhaul links, with five baseline schemes - FIFO, LFU, LRU, MPC, and Random (RND) Caching. Since our focus is on What to cache, no meaningful comparison can be made with other state-of-the-art works which primarily focus on Where to cache. Other quality metrics such as page load times and backhaul bandwidth usage are easily derived from hit ratio when network specifications (such as link capacities) are available. The impact on the hit ratio of three key system parameters - traffic volume, cache capacity, and content popularity, is studied.
5. Results and Discussion
Since a series of batches of requests is used to simulate their arrival, the traffic volume is controlled by the parameter batch size that indicates the number of currently active communities (for which users are requesting content). As expected, the hit ratio decreases for all caching schemes with an increase in traffic volume. However, Bingo maintains a significant gain over the baselines, up to over the closest baseline, i.e. LRU. When traffic volume in the network is higher, a better cache hit ratio becomes more crucial to reduce response latency and backhaul load. We also observe that increasing the cache capacity increases the performances gap and thus it is more fruitful for Bingo compared to the baseline schemes.
Content popularity is controlled by the parameter of the Zipf distribution. The hit ratio increases slightly for all caching schemes when is increased, i.e. file popularity becomes less uniform because as the same popular content is being requested more often (by more communities), more requests are served locally as compared to when fewer files were popular, i.e. more diverse content was being requested. Our experiments show that for , MPC, LFU, and LRU show a drastic increase in hit ratio whereas Bingo
shows a relatively smaller increase. However, such skewed distributions are not realistic and do not reflect request patterns in the real world.Bingo consistently outperforms the baselines for practical popularity distributions (Yang et al., 2018; Bastug et al., 2014).
Note that Bingo outperforms MPC, the main competitor since the number of files available is essentially infinite, and caching what more users are expected to request is better than expecting users to request only a few popular files. Furthermore, this implementation of MPC uses exactly known file popularity distribution, unlike in reality. We consider static popularity distributions since Bingo does not depend on the global popularity of the content and instead draws on the local dynamics of ‘popularity’.
We proposed Bingo, a proactive edge caching scheme for cellular networks that utilizes structural information of the social network. For each requested content, we identify the approximate community that maintains the maximal interest in that content. The estimated size of the community, together with the current cache status is used to make caching decisions. MNO may choose to deploy Bingo at select sites based on expected revenues. We generate user requests using synthetic communities, which simulate real-world scenarios, to empirically evaluate Bingo. We demonstrate that Bingo substantially outperforms the (more) reactive caching schemes on varying network traffic volumes and community structures.
- Ali et al. (2021) S. Ali, M. H. Shakeel, I. Khan, S. Faizullah, and M. A. Khan. 2021. Predicting attributes of nodes using network structure. ACM Transactions on Intelligent Systems and Technology 12, 2 (2021), 1–23.
- Bastug et al. (2014) E. Bastug, M. Bennis, and M. Debbah. 2014. Living on the edge: The role of proactive caching in G wireless networks. IEEE Communications Magazine 52, 8, 82–89.
- Bernardini et al. (2013) C. Bernardini, T. Silverston, and O. Festor. 2013. MPC: Popularity-based Caching Strategy for Content Centric Networks. In IEEE International Conference on Communications. 3619–3623.
- Golrezaei et al. (2014) N. Golrezaei, P. Mansourifard, A. Molisch, and A. Dimakis. 2014. Base-Station Assisted Device-to-Device Communications for High-Throughput Wireless Video Networks. IEEE Transactions on Wireless Communications 13, 7, 3665–3676.
- Ji et al. (2016) M. Ji, G. Caire, and A. Molisch. 2016. Wireless Device-to-Device Caching Networks: Basic Principles and System Performance. IEEE Journal on Selected Areas in Communications 34, 176–189.
- Lu et al. (2015) Z. Lu, X. Sun, Y. Wen, G. Cao, and T. Porta. 2015. Algorithms and Applications for Community Detection in Weighted Networks. IEEE Transactions on Parallel and Distributed Systems 26, 11, 2916–2926.
- Müller et al. (2017) S. Müller, O. Atan, M. van der Schaar, and A. Klein. 2017. Context-Aware Proactive Content Caching With Service Differentiation in Wireless Networks. IEEE Transactions on Wireless Communications 16, 2, 1024–1036.
- Ong et al. (2014) M. Ong, M. Chen, T. Taleb, and X. Wang. 2014. FGPC: Fine-grained popularity-based caching design for content centric networking. In ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems. 295–302.
- Poularakis et al. (2014b) K. Poularakis, G. Iosifidis, V. Sourlas, and L. Tassiulas. 2014b. Multicast-aware caching for small cell networks. In IEEE Wireless Communications and Networking Conference. 2300–2305.
- Poularakis et al. (2014a) K. Poularakis, G. Iosifidis, and L. Tassiulas. 2014a. Approximation Algorithms for Mobile Data Caching in Small Cell Networks. IEEE Transactions on Communications 62, 10, 3665–3677.
- Shanmugam et al. (2013) K. Shanmugam, N. Golrezaei, A. Dimakis, A. Molisch, and G. Caire. 2013. FemtoCaching: Wireless content delivery through distributed caching helpers. IEEE Transactions on Information Theory 59, 12, 8402–8413.
- Tran et al. (2017) T. Tran, A. Hajisami, and D. Pompili. 2017. Cooperative Hierarchical Caching in G Cloud Radio Access Networks. IEEE Network 31, 4, 35–41.
- Weifeng et al. (2020) L. Weifeng, Z. Mingqi, X. Jia, C. Siguang, Y. Lijun, and X. Jian. 2020. Cooperative caching game based on social trust for DD communication networks. International Journal of Communication Systems 33, 9, e4380.
- Yang and Leskovec (2012) J. Yang and J. Leskovec. 2012. Community-Affiliation Graph Model for Overlapping Network Community Detection. In IEEE International Conference on Data Mining. 1170–1175.
- Yang et al. (2018) Y. Yang, Y. Wu, N. Chen, K. Wang, S. Chen, and S. Yao. 2018. LOCASS: Local Optimal Caching Algorithm with Social Selfishness for Mixed Cooperative and Selfish Devices. IEEE Access 6, 60–72.
- Zhang and Zhu (2018) X. Zhang and Q. Zhu. 2018. Hierarchical Caching for Statistical QoS Guaranteed Multimedia Transmissions over 5G Edge Computing Mobile Wireless Networks. IEEE Wireless Communications 25, 3, 12–20.
- Zhang et al. (2019) Y. Zhang, C. Li, T. Luan, Y. Fu, W. Shi, and L. Zhu. 2019. A Mobility-Aware Vehicular Caching Scheme in Content Centric Networks: Model and Optimization. IEEE Transactions on Vehicular Technology 68, 4, 3100–3112.
- Zhu et al. (2017) K. Zhu, W. Zhi, X. Chen, and L. Zhang. 2017. Socially Motivated Data Caching in Ultra-Dense Small Cell Networks. IEEE Network 31, 4, 42–48.