Multi-Manifold Learning for Large-scale Targeted Advertising System

07/05/2020 ∙ by Kyuyong Shin, et al. ∙ NAVER Corp. 0

Messenger advertisements (ads) give direct and personal user experience yielding high conversion rates and sales. However, people are skeptical about ads and sometimes perceive them as spam, which eventually leads to a decrease in user satisfaction. Targeted advertising, which serves ads to individuals who may exhibit interest in a particular advertising message, is strongly required. The key to the success of precise user targeting lies in learning the accurate user and ad representation in the embedding space. Most of the previous studies have limited the representation learning in the Euclidean space, but recent studies have suggested hyperbolic manifold learning for the distinct projection of complex network properties emerging from real-world datasets such as social networks, recommender systems, and advertising. We propose a framework that can effectively learn the hierarchical structure in users and ads on the hyperbolic space, and extend to the Multi-Manifold Learning. Our method constructs multiple hyperbolic manifolds with learnable curvatures and maps the representation of user and ad to each manifold. The origin of each manifold is set as the centroid of each user cluster. The user preference for each ad is estimated using the distance between two entities in the hyperbolic space, and the final prediction is determined by aggregating the values calculated from the learned multiple manifolds. We evaluate our method on public benchmark datasets and a large-scale commercial messenger system LINE, and demonstrate its effectiveness through improved performance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Figure 1. LINE messenger advertisement system.

Messenger platform is an emerging advertisement channel. In messenger platform, users experience a message-typed advertisement (ad) with a separate chat room feeling more private and direct compared to traditional ad channels, e.g., search engine, and web portal. High penetration ratios of smartphone and SNS utilization enable messenger ad system to become more promising with high sales (zhang2018mobile). Figure 1 shows an example of our LINE messenger advertisement system.

However, an ad for broad random users without precise user targeting can not resonate with their potential audience playing as annoying spam. In this paper, since the accurate representation of the users and ads is a necessary for targeted advertising system, we present the deep representation learning scheme using hyperbolic geometries. Our method enables effective capture of the hierarchical and complex relationships between users and ads.

Figure 2. Minkowski space.

One of the most prominent approaches in traditional studies is Collaborative Filtering (CF) (schafer2007collaborative; zhang2016collaborative; chen2019collaborative)

which finds a group of users who have responded to the ads similar to the target ad. Owing to the limitation of the low-dimensional representation of CF, recent studies have presented various neural network based approaches 

(kim2019tripartite; yi2019deep; zhang2019deep; park2020hop) that can effectively embed user and advertisements into the high-dimensional spaces. Despite the wide success and expansion of those methodologies, most of them implicitly embed the entities (i.e., users and ads) into points in Euclidean space, which causes an inherent limitation in the representation power. The latest studies, however, pointed out that the real-world user-item interaction datasets exhibits the hierarchical structures; therefore, it is more desirable to map embeddings into a hyperbolic space than Euclidean space (wang2015exploring; tran2018hyperml; schmeier2019music; chamberlain2019scalable). Unlike in the flat (Euclidean) plane, the distance between the nodes in tree-structured data is preserved in hyperboloid (gromov1987hyperbolic), and therefore hyperbolic geometry is proven to naturally suitable for modeling hierarchical structures.

Although the hyperbolic space has successfully reflected the topology in user-item representation, the existing approaches fix its origin and use single manifold as an embedding space. In a real-world, large-scale advertising system, there exist various groups of users with different preference characteristics, and it may not be valid to assume that every user and advertisement entity can be expressed by using single geometry. There were several researches in Euclidean space that improves the prediction performance by adopting clustering algorithms into the recommendation tasks (ungar1998clustering; dubois2009improving; gong2010collaborative). However, usage of clustering scheme into the hyperbolic manifold learning on advertising system had not yet been reported. The main contribution of this paper is to extend the hyperbolic representation learning to the Multi-Manifold Learning framework by constructing multiple hyperbolic spaces centered on each clustered user group.

We evaluate the proposed framework on a large-scale real-world dataset collected from LINE messenger platform. The experimental results demonstrate that the proposed model increases the prediction performance and allows the representation to be diversified. We further report the performance on the public benchmark datasets to show that Multi-Manifold Learning can be applied commonly to various tasks as well as the targeted advertising.

2. Targeted advertising system

Unlike the traditional forms of advertising that expose ads to random users, the core of targeted advertising is that the system sends ads to different user groups based on the user-ad-preferences. The targeted advertising system uses the side information of those ads, such as images and advertising phrases, as well as the user’s demographic information and click history to find the most relevant user group.

Starting from the for users and for advertisements with attribute matrices and , the neural networks and transforms the and to and , where , , and denotes the number of attributes of , and the number of hidden features, respectively. To build a more powerful user representations, we introduce additional neural networks that embeds users’ click history matrix into where is one if there is positive interaction between the -th user and -th advertisement and is zero otherwise.

Finally, the preference scores, , between the users and ads are computed through the distance or inner-product between embeddings of them:

(1)

where is the distance between two points and on the given manifold. User preference scores are sorted for each ad, and the top users with the highest scores are selected as the targeted users for the ad. In this paper, we used Fermi-Dirac decoder (krioukov2010hyperbolic; nickel2017poincare) for the decision function.

Figure 3. M.C. Escher style illustration of the Poincaré disk model.

3. Hyperbolic Geometry

3.1. Riemannian Manifolds

Figure 4. Conceptual scheme of our proposed method.

A topological space is a smooth manifold if satisfies following four conditions: It is Hasudorff, It is second countable, contained a open sets which is homeomorphic to , and its transition maps are infinitely differentiable. For , we can define the tangent space which is the first order approximation of around point .

A Riemannian manifold is (, ), where is a differential manifold and is Riemannian metric, which is a family of inner products on tangent spaces, with smoothly varying . Riemannian metric is used to measure distances by integrating the length between two points:

(2)

where , , and . A shortest path between two points and on curve is called a geodesic, and equivalent to a straight line in Euclidean space. From geodesic, we can define the projection by utilizing geodesic coordinates. This is called exponential map at

, which projects a vector

of the tangent space at to a point on the manifold. In this map, is the unique geodesic satisfying with unit-norm . Consequently, in very local area, exponential map is satisfying . The reverse map is called logarithmic map that maps back to the tangent space at such that .

3.2. Hyperbolic Space

Hyperbolic space is a non-Euclidean space with a constant negative Gaussian curvature. Gaussian curvature is the product of the principal curvature, which is divided into a sphere, hyperbola, and flat depending on whether the value is constantly positive, negative, or zero. Hyperbolic space is often associated with Minkowski spacetime in special relativity. Minkowski model is a -dimensional hyperbolic geometry in which points are represented on the future light cone of a two-sheeted hyperboloid of -dimensional Minkowski space as shown in Figure 2.

Learning on hyperbolic manifold. Let denote the Minkowski inner product, with the coordinates and representing time. We denote as the hyperbolic manifold with constant negative curvature - , and , the tangent space centered at point . As described in Section 3.2, mapping between tangent space and manifold is performed by exponential and logarithmic maps. There are already known expressions of the exponential and the logarithmic maps on hyperboloid manifolds, which allow us to map points on hyperboloid to tangent spaces and vice-versa: For and such that and , the exponential and logarithmic maps of the hyperbolic model are given by:

(3)
(4)

where denotes norm of and denotes geodesic distance between and . Above expressions assume that , when is small enough and tangent vector is unit-speed, i.e. .

Diffeomorphism. The hyperbolic model tends to be more robust and stable than the Poincaré model, but the Poincaré model is easier to interpret and can visualize embeddings directly on the Poincaré disk. Fortunately, Poincaré disk is a stereographic projection of hyperboloid (forrester2009derivation) which means theses two models are homeomorphic and exists a diffeomorphism mapping hyperbolic model onto the Poincaré model:

(5)

we will utilize deffeomorphism for visualizing embeddings of data in Figure 5.

Why hyperbolic manifold for targeted advertising. The hyperbolic manifold is often considered as well-suited space for hierarchical structure. Suppose the task that embed a tree into the metric space while preserving its structural properties. i.e., the number of nodes at -th layer is . As a result, Euclidean space cannot contain all the nodes in the tree, which leads to poor representation of the model. However, in the hyperbolic space, the length of a circle is given as sinh with the constant Gaussian curvature . Since sinh = , the circle length grows exponentially with , enough to include all the nodes. This property is illustrated in Figure 3. Each triangle has constant area in hyperbolic space, but in Euclidean space, it rapidly shrinks at the boundary. The latest studies, pointed out that the real-world user-ads interaction exhibits the hierarchical relationships (nickel2017poincare; chami2019hyperbolic); thus, the properties of hyperbolic space have great potential to learn distinct representations in targeted advertising system.

4. Multi-Manifold Learning

The core functionality of our large-scale targeted advertising system is to capture the representational differences between various user groups and advertisements. To do this, we propose Multi-Manifold Learning that builds multiple manifolds for user groups because it may not be valid to assume that every user entity can be expressed by using single geometry. The conceptual scheme of our Multi-Manifold Learning is shown in Figure 4.

Our proposed method consists of three stages. First, input and pass through DNNs and separately and users’ click history

passes through transformer network 

(vaswani2017attention) . Second, calculate user embedding by adding and , and cluster them into groups by using -means clustering (alsabti1997efficient). We denote as a centroid of -th group for Finally, these embedding vectors map onto each -th hyperbolic manifold of which the origin is . Then, we calculate the preference score on each manifold using a Fermi-Dirac decoder (krioukov2010hyperbolic; nickel2017poincare), and aggregate them.

The detailed process can be formulated as follow:

(6)
(7)

where denotes embedding vector centered by centroid . The represents Euclidean vector mapped onto hyperbolic manifold with respect to the origin o. It is essential to centering the with respect to centroid of each user group. Optimizing often fails if manifold’s origin is set to a point with a value other than the origin o. The embeddings on each

-th hyperbolic manifold are used for computing user preference score through Fermi-Dirac decoder. Finally, our overall probability and loss are:

(8)
(9)

where probability between user and ads on each manifold is and user preference of whole manifolds are . The and in Fermi-Dirac decoder are hyper-parameter.

After mapping embeddings on the hyperboloid, an additional neural network layer such as Hyperbolic Neural Network (HNN) (ganea2018hyperbolic) can be added to perform weight learning on the hyperbolic manifold, but we empirically found that it does not show any performance improvements.

[ caption = Model Performance on LINE messenger advertisement system, label = tab:performance, doinside = ]lcccc & RocAuc & Accuracy & Average Precision & Shannon Entropy
CF & 0.756 & 0.673 & 0.786 & 14.431
MLP & 0.770 & 0.681 & 0.775 & 14.197
HNN & 0.778 & 0.753 & 0.841 & 14.451
Multi-Manifold & 0.818 & 0.765 & 0.846 & 14.567

[ caption = Model Performance on public benchmark MovieLens dataset, label = tab:performanceML, doinside = ]lcccc & MovieLens - 1M & MovieLens - 100K
& RocAuc & Average Precision & RocAuc & Average Precision
CF & 60.3 & 67.4 & 60.5 & 61.1
MLP & 57.4 & 66.3 & 61.7 & 62.0
HNN & 61.7 & 69.0 & 68.0 & 67.8
Multi-Manifold & 61.5 & 69.8 & 68.3 & 68.5

5. Experiment

5.1. Dataset

[ caption = Model performance comparison as the number of cluster increases on LINE messenger dataset., label = tab:ablation, doinside = ]rcccc # of Clusters & RocAuc & Accuracy & Average Precision & Shannon Entropy
1-cluster & 0.798 & 0.715 & 0.817 & 13.871
3-cluster & 0.805 & 0.753 & 0.840 & 14.146
5-cluster & 0.818 & 0.765 & 0.846 & 14.567
10-cluster & 0.813 & 0.753 & 0.841 & 14.653
15-cluster & 0.810 & 0.753 & 0.841 & 14.665

(a) MLP model
(b) HNN model
(c) Our Multi-Manifold Learning
Figure 5. Visualization in embedding representations of users and advertisements. (a) embedding of Euclidean MLP, (b) Poincaré disk visualization of HNN, (c) Poincaré disk visualization of our Multi-Manifold Learning with two clusters on each manifold. We visualize them by using diffeomorphism between Hyperbolic space and Poincaré space as described in Section 3.2.

We collect dataset from LINE messenger platform that targets users from all over the world, and the number of users in the service is about 200 million. We randomly select one million users111Since the number of users using the service is huge, we use a subset of users for experiments.

We split a dataset based on time: the first fourteen days for training and the subsequent two days for test. We report the performance of the last epoch.

For better representation of user and advertisement embeddings, age, gender, mobile OS type, interest, number of LINE Pay membership follower, and number of LINE Pay membership followee attributes are used for users, while text and image are used for advertisements. For each attribute, we use shallow DNNs to make dimensional feature vectors and aggregate them to get the feature matrices and .

Due to a large number of users, we use 10,240 randomly sampled users for each batch. The advertisement click history is used up to the day before the forecast date, and click histories are normalized for each user.

For a fair comparison with the base models, we extend our experiments to public benchmark datasets: MovieLens222https://grouplens.org/datasets/movielens, which is widely used public dataset for recommender systems. We modify the dataset to binary classification: a label as 1 if the movie score is greater than 4, otherwise as 0.

5.2. Baselines

To demonstrate the effectiveness of proposed model, we compared our model with following three base models:

  • [leftmargin=*]

  • Collaborative Filtering (CF) (schafer2007collaborative; zhang2016collaborative; chen2019collaborative): The underlying assumption of Collaborative Filtering is the premise that users’ past trends will remain the same in the future. In other words, it is a technique to identify users with similar patterns based on their preferences and interests.

  • Multilayer Perceptron (MLP) (xue2017deep; yi2019deep; zhang2019deep): There are numerous types of MLP algorithms that are based on Matrix Factorization. We report the presented framework using Euclidean space as MLP in the following results. Note that, Multi-Manifold Learning in Euclidean space is not reported, since the relative distance between two points is translation-invaraint in Euclidean space.

  • Hyperbolic Neural Network (HNN) (ganea2018hyperbolic)

    : This work generalizes the linear transform and bias addition of DNNs on the hyperbolic space and proposes several important deep learning tools on the hyperbolic space. We use HNN, which is based on basic DNNs, where the core operations are executed in hyperbolic space.

For the fairness of the comparison, we adopt the same neural network architectures for and . The hidden vector size is set to 64 and we do not use dropout (srivastava2014dropout)

and l2 regularization. All the experiments were performed on NAVER SMART Machine Learning platform (NSML) 

(sung2017nsml; kim2018nsml)

using PyTorch 

(NEURIPS2019_9015).

5.3. Experimental Results and Analysis

We report three accuracy metrics of RocAuc, Accuracy, and Average Precision, and one diversity metric of Shannon Entropy. In particular, Average Precision is set up for the targeted Advertising System. The user preference is sorted for a specific advertisement, and then precision is calculated for each ad of the top users. Finally, we average the precision of all ads.

Performance comparison. As shown in Table LABEL:tab:performance, our method shows the best prediction performance for all accuracy metrics, as well as the diversity metric. The diversity metric of Shannon entropy for each model shows how diversified the recommended users are. Our model shows the highest diversity compared to other baselines, indicating that superior expressiveness of embedding enables precise targeting. To further demonstrate the effectiveness of our model on general dataset, we present additional experimental results on public benchmark dataset MovieLens. As represented in Table LABEL:tab:performanceML, our model shows the best or second-best performance, demonstrating its generality not overfitted to a certain dataset.

Effects of the number of clusters. To illustrate the effect of the number of clusters on the model performance, we report the prediction accuracy and diversity of our model for different ’s. Table LABEL:tab:ablation shows that the overall performance improves as the cluster grows, and the best performance is obtained at . After , the prediction accuracy converges while the diversity improves further. Overall, we select the as five throughout the experiments.

Embedding visualization. Figure 5 shows how Multi Manifold Learning works compared to others. From embedding visualization of MLP model and HNN model, we can verify their positive user pool responding to ads are very small. On the other hand, our proposed method Multi-Manifold Learning shows our model includes many users in a positive pool that is compatible with ads.

The data embedding in a different hyperboloid, originating from centroids of different user groups, have different embedding spaces. Since we set two clusters, the Figure 5 shows results for two manifolds. We can verify each manifold has a different distance in the hyperbolic space, ads that are not relevant to the user are pushed to the edge, while preferred ads appear to move toward the center. It is because the hyperbolic space we constructed is centered by the centroids of the well-clustered user group.

6. Conclusion

Traditional targeted advertising systems struggle with data representation capabilities because of the inherent limitation of Euclidean space. To tackle this issue, we present Multi-Manifold Learning, a well-designed technique to learn better representation of users and advertisements. Experimental results show the proposed scheme improves the targeted advertising quality in terms of both accuracy and diversity. As the future directions, we will develop a Multi-Manifold Learning scheme in terms of diffeomorphism learning. Besides, we will extend our method on real-world large scale online service of LINE messenger platform.

Acknowledgements.
The authors would like to thank Professor Hyunwoo J. Kim and NAVER Clova ML X team for insightful comments and discussion.

References