DeepAI
Log In Sign Up

Boosting Sensitivity of Large-scale Online Experimentation via Dropout Buyer Imputation

09/09/2022
by   Sumin Shen, et al.
ebay
Virginia Polytechnic Institute and State University
0

Metrics provide strong evidence to support hypotheses in online experimentation and hence reduce debates in the decision-making process. In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a cluster-based k-nearest neighbors-based imputation method. Our proposed imputation method considers both the experiment-specific features and users' activities along their shopping paths, allowing different imputation values for different users. To facilitate efficient imputation in large-scale data sets in online experimentation, the proposed method uses a combination of stratification and clustering. The performance of the proposed method was compared to several conventional methods in a past experiment at eBay.

READ FULL TEXT VIEW PDF

page 18

page 19

08/28/2022

Leachable Component Clustering

Clustering attempts to partition data instances into several distinctive...
05/13/2020

Multiple Imputation for Biomedical Data using Monte Carlo Dropout Autoencoders

Due to complex experimental settings, missing values are common in biome...
07/06/2020

Does imputation matter? Benchmark for predictive models

Incomplete data are common in practical applications. Most predictive ma...
06/09/2021

EMFlow: Data Imputation in Latent Space via EM and Deep Flow Models

High dimensional incomplete data can be found in a wide range of systems...
06/28/2022

No imputation without representation

By filling in missing values in datasets, imputation allows these datase...
02/08/2016

Adaptive imputation of missing values for incomplete pattern classification

In classification of incomplete pattern, the missing values can either p...
11/17/2021

A Graph-based Imputation Method for Sparse Medical Records

Electronic Medical Records (EHR) are extremely sparse. Only a small prop...

Introduction

Online experimentation has been playing a key role in data-driven decision making in the IT industry including Microsoft (kohavi2009online; kohavi2014seven), Google(tang2010overlapping), Linkedin(xu2018sqr), Netflix(xie2016improving), Uber, eBay(nie2020dealing), and many others (gupta2019top). Generally, online controlled experimentation, also known as A/B testing, is conducted for a pre-determined amount of time to compare the difference in metrics between the treatment group and the control group where users are randomly assigned to. Prior to experimentation, a set of high-quality metrics are determined to assess the effects of new features in the treatment group. The collected metric results can provide strong evidence to support hypotheses and hence accelerate the decision-making process (deng2016data; machmouchi2016principles; dmitriev2016measuring). In this study, we focus on the analysis of metrics that have incomplete measurements at the end of data collection in experiments.

According to the positions in the shopping funnel, metrics can be categorized as top, middle, and bottom funnel metrics. For instance, a successful purchase typically requires users to take multiple steps from the top homepage webpage to the bottom purchase webpage in the shopping funnel. In online experimentation, it is common for millions of users to arrive at the top funnel (e.g., homepage webpage), while only a small percentage of users reach the bottom funnel (e.g., purchase webpage). Between the transition from the top funnel to the bottom funnel, users need to navigate through multiple pages where they can exit from the shopping process. There are numerous scenarios in which users can exit the funnel, resulting in incomplete records of their purchases or other metrics. A common occurrence is simply that each experiment has its own experiment duration. Keeping experiments alive for a long period of time is expensive due to the high operational efforts and business opportunity costs. When we close experiments, we stop the track of all users, but some users might yet complete their purchases. This incompleteness in metrics due to the delay in collecting measurements for bottom-funnel metrics in experimentation are inevitable. There is also the possibility that users are lost to follow due to technical issues or user unavailability. For instance, when users switch from the desktop app to the mobile app, they become unavailable. It is essential to fill in the incomplete metrics to improve metric quality, leading to trustworthy results and better decisions.

With incomplete metric measurements, the inference of the difference in metrics between the treatment and the control in experiments is at the risk of being biased and inaccurate (imbens2001estimating; imbens2000analysis; goldstein2007subtle). To analyze experiments with missing metric values, a naive approach is to disregard users with incomplete outcomes. This approach assumes that the missingness is completely at random and that the fully observed users are representative of the entire population. Such an approach will reduce the total number of users in the study, leading to a decrease in the experiment power. The power decrease is substantial especially when the proportion of missingness is high.

Various imputation methods have been developed to address problems with missing data. One widely used method is the single imputation method, which fills in missing values with a single value, such as the mean of observed outcomes, for both the treatment group and the control group. The single imputation method preserves the full sample size, but it raises concerns regarding results with a distorted distribution and underestimated uncertainty (spineli2020comparison). In addition, the single imputation method disregards information from other observed variables collected along users’ journeys within the funnel. Other imputation methods have been developed for missing at random (MAR) and missing not at random (MNAR) scenarios. The MAR assumes that the missing mechanism is only associated with the observed variables (rubin1976inference; imai2009statistical; bhaskaran2014difference)

. Likelihood-based methods, such as generalized linear mixed models, are developed in clinical trials with incomplete outcomes

(molenberghs2004analyzing)

. The performance of the methods depends on the degree to which the assumptions are held for MAR. For MNAR in which the effect from missing outcomes is non-ignorable, the observed difference would be a biased estimate of the average treatment effect

(molenberghs2004analyzing)

. Regression-based imputation methods, such as the logistic regression, are employed for modeling the indicator for missingness

(mao2021driving). Other prevalent methods, such as matching imputations, identify similar users from a set of variables. In general, these imputation methods require the identification of users with missing outcomes and users with outcomes as zero. In other words, general imputation methods are incapable of handling certain online experimentation scenarios in which users’ missing outcomes represent both missing cases and zero cases.

To address the challenges, we propose a cluster-based k-nearest neighbors (kNN)-based imputation method for the analysis of online controlled experimentation in the presence of incomplete metrics. The idea is to identify and impute incomplete metrics with users’ neighbors by incorporating the structure information of the online experimentation data set. Specifically, the proposed method consists of two steps. The first step is to partition the data set into clusters after the stratification of experiment-specific features, specifically, the treatment assignment and the buyers’ characteristics. In the second step, we perform the kNN-based imputation method within each cluster. We intend to improve the metric quality so that the experiment results can be trustworthy and better data-driven decisions can be made. In our framework, the treatment assignment and user covariates are fully observed, whereas only the outcome at the bottom of the funnel has missing values. In addition, we divide users with missing outcomes into two categories: visitors and dropout buyers. The proposed method has three key advantages. First, the proposed method uses the informative covariates during users’ journeys in the shopping funnel to impute incomplete metrics. Specifically, our method evaluates the heterogeneous impact from different user segments on missing rates in metrics. Second, the imputed values from our method are not limited to single values. Lastly, our method employs stratification and clustering to alleviate the computation issues in large-scale online experimentation data sets.

Throughout the paper, we consider the metric purchase as an example of the incomplete metric at the funnel’s bottom for illustration. We also assume that purchase is the only metric (i.e., outcome) of interest in the experiment. The rest of the paper is organized as follows. In Sections 2 and 3, we detail the problem formulation, the proposed method, and the estimation procedures. In Section 4, we describe the competing methods and performance measures. A real case study is conducted in Section 5. We conclude this work with some discussion in Section 6.

Problem Formulation

In the context of online controlled experiments, we can classify users into three types based on their purchase behaviors: visitors, real buyers, and dropout buyers. Visitors participate in experiments but do not make contributions (e.g., purchases). Real buyers not only participate in experiments but also make contributions (e.g., purchases). Dropout buyers could have made their contributions (e.g., completed their transactions) within the experimentation period but failed due to various reasons. For example, users could drop out of the experiment because of unexpected external payment issues. Another example is that the experiment lost users due to various technical issues.

Suppose there are users in an experiment, and let denote whether the -th user is a buyer or not, and let denote the metric value of the -th user impacted by the experimentation. That is,

and where is the indicator function. We know for sure that user is a real buyer and the corresponding value amount if he/she has completed transaction(s) during the experimentation period. In other cases, it is ambiguous whether he/she is a dropout buyer or merely a visitor. Therefore, we use and if the th user is a real buyer and and to represent the ambiguous situation (i.e., could be a dropout buyer or a visitor). To clarify,

However, some practitioners arbitrarily treat all and as 0 without the diligence to distinguish between dropout buyers and visitors. Here, we denote such an arbitrary but simplified buyer indicator as

Their corresponding vectors are denote as

Additionally, let denote the relevant features for user , , , and let , Without loss of generality, we assume that features are continuous variables.

Suppose there are real buyers among the total users, and without loss of generality, let us assume the first users are real buyers. Denote users’ purchase and transactional amount during the experimentation period using vectors

The problem of interest is to impute missing values and in the context of online experimentation. Among users with missing values, visitors are mixed with dropout buyers. Therefore, our proposed method is to firstly identify the candidates of dropout buyers (i.e., identifying the candidates of 1s in ) with the help of a classification model and then impute the and using an efficient cluster-based nearest neighbors-based approach.

The objective of the imputation problem is to impute missing values such that they are close to the underlying true data. The missing value imputation problem can be formulated as

where

is a loss function to quantify the difference between the imputed missing values

and the underlying true values .

Imputing missing values with non-parametric methods such as the nearest neighbors algorithm in large-scale data sets is challenging due to the large computation requirements for distances between pairs of data points. To solve this challenge, we propose to incorporate the data clustering patterns into the imputation. In other words, we partition users into clusters and then perform imputations within each cluster. Thus, the cluster-based imputation problem is described as

(1)

where denotes the features for user , and represents the user with missing value belongs to cluster with the centroid , the constant controls the within-cluster distances, and is the L-norm. The set of indices is defined as . After imputing , we can estimate the corresponding as well.

Note that it is unknown whether a user with an incomplete metric is a visitor or a dropout buyer. The dropout buyers are mixed with visitors because both do not have their purchase information recorded. To address the challenge, in Section 3.1, we apply the logistic regression model to identify a certain portion of visitors and narrow down the candidates of dropout buyers. Section 3.2 will detail the proposed cluster-based imputation. Notice that the data set in online controlled experiments often is very large such that the conventional clustering methods cannot be conducted efficiently. To alleviate the computation issue, Section 3.3 will consider a stratification-based clustering and describe how to choose the number of clusters.

The practitioners’ simplified buyer indicator reveals partial information in the true buyer indicator . Therefore, a classification model based on provides us with the likelihood of purchases. Users with a high likelihood but missing purchase records can serve as the candidates for dropout buyers. Since is used as a substitution of , we call pseudo-response.

Specifically, we propose to apply the logistic regression model for the buyer identification. Denote the conditional probability for user

as , that is,

We model the conditional probability with the logistic model with = . Note that the features used in the logistic regression model are believed to be closely related to users’ purchase behaviors. A threshold is needed in the logistic model for classification. One widely used threshold value is 0.5.

Comparing the model prediction and pseudo-response, Table 1 summarizes four types of classification results: false positive (FP), true negative (TN), false negative (FN), and true positive (TP) from the classification model. The FP indicates that the users with pseudo-response as 0 should have purchase information. We use this inconsistency to figure out the candidates of dropout buyers. That is, the FP cases can be either visitors or dropout buyers. The TN suggests the agreement that these users do not have purchases recorded. Thus, we treat all TN cases as visitors. The FN and the TP are users recorded with purchase behaviors, and hence they are real buyers, not dropout buyers or visitors.

Pseudo-response () Prediction Description
True Negative (TN) 0 0 Visitors
False Positive (FP) 0 1 Candidates of dropout buyers
False Negative (FN) 1 0 Real buyers
True Positive (TP) 1 1 Real buyers
Table 1: Summary of four categories of results in the logistic regression model.

Suppose there are visitors and dropout buyer candidates that have been identified. Without loss of generality, let us assume the first users in the missing set are those visitors. Then we write as

where , and with 0 representing visitors and 1 representing dropout buyers. Similarly, we denote the corresponding continuous response for the purchase amount as

where , represents the purchase amounts from estimated visitors and represents the missing non-negative response from users. In the following imputation methods, we consider is known and the aim is to impute .

Clustering improves data analysis efficiency by identifying inherent structure patterns and partitioning the large-scale data set into small subsets. In each strata

(described later in the stratification step), we perform the k-means clustering method

(macqueen1967some) to form clusters, which is formulated as

Within each cluster

, we suggest the k-nearest neighbors (kNN) approach for imputation. The main idea of the kNN method is that nearby data points are similar to each other. The kNN algorithm is straightforward and does not require parametric model estimation, but it is computationally expensive and becomes slow as the size of the data set increases. However, this computational burden is greatly mitigated by the strategy of clustering. Given the specific cluster

(i.e., the fixed constraint in (1)), the imputation problem (1) with the kNN method can be written as

(2)

where is the binary label, is a positive integer representing the size of target user’s nearest neighbors and is the nearest neighbors’ user index. In this work, we use a fixed value 15 for . It is not difficult to derive the solution to the objective function, which is written as

where is the average of response in the nearest neighbors.

With the imputed , we obtain the corresponding imputed missing value from the cost function formulated as

That is, the estimated is given by

where is the average of response in the nearest neighbors.

The nearest neighbors are determined based on their distances to the target user, that is, the closest neighbors are found by

where is the distance between the users and . In this study, we use the L-norm to measure distances.

The data set in the online controlled experimentation often is very large to cluster in the imputation step. To reduce the computational burden in clustering, we propose the stratification-based clustering approach. The key idea is to firstly stratify the user pool, and then perform clustering within each strata.

In the stratification step, we stratify users into two hierarchical levels: treatment assignment and users’ buying characteristics. The treatment assignment, including the treatment group and the control group, is determined by the experimentation configuration. Generally, in online controlled experiments there are two treatment assignments: control and treatment. However, more than two treatment assignments are possible in cases such as multivariant experiments. User’s buying characteristics, including new buyers, infrequent buyers, frequent buyers, and idle buyers, are categorized based on users’ purchase activities at eBay. There are in total 12 buyer categories. Note that both the experimentation configuration and the users’ buying segments are determined prior to the start of the experimentation. The hierarchical stratification is formulated as

where is the strata at the -th treatment level and the -th users’ buying characteristics in the feature space , and there are in total levels treatment assignment and levels users’ buying characteristics.

The combination of stratification and clustering within each strata greatly improves computation efficiency in the imputation step, where the neighbors of the target user are searched only within its belonged cluster.

The number of clusters in each strata from the stratification is obtained by maximizing a simplified version of the Silhouette score, also known as simplified Silhouette. The Silhouette score is an effective measure of clustering goodness (rousseeuw1987silhouettes), but it requires an intense computation of the distance betweeb each data point and the rest data points. The simplified Silhouette improves the computational efficiency of the Silhouette score by calculating the distances between each data point and centroids of clusters (hruschka2004evolutionary). The simplified Silhouette of data point , denoted as , is defined as

where is the distance between the data point and the centroid of the cluster it belongs to, and is the minimum of distances between the data point and the centroids of other clusters. The final simplified Silhouette is the average of all data points’ simplified Silhouette. Note that the distances of each data point to its cluster centroid have already been calculated and recorded during the modeling process of k-means clustering, which greatly reduces the computational burden of the simplified Silhouette.

A pseudo-code for the proposed method is summarized in Algorithm 1.

1:INPUT: the binary response , the continuous response , the pseudo-response , and the predictor features .
2:Perform the logistic regression model on the data set with the pseudo-response and the predictor features . Obtain the classification results including false positive (FP) and true negative (TN).
3:Stratification. Stratify based on the treatment assignment and the users’ buying characteristics.
4:for each strata do
5:   Use the FP as the test set, and the rest as the training set in the kNN method.
6:   for each target user in the test set do
7:      Clustering. Perform k-means clustering in the strata,
8:      Find out the cluster that the target user belongs to,
9:      Imputation. Within that cluster, perform the kNN method to find out nearest neighbors.
10:      if  then
11:         Impute and .
12:      else
13:         Impute and .
14:      end if
15:   end for
16:end for
17:OUTPUT:
Algorithm 1 Pseudo code for the proposed method

To evaluate the proposed method’s performance, we compare the proposed method with a list of benchmark models, including

  1. Complete-case analysis (BM);

  2. Unconditional control-mean imputation (BM);

  3. Unconditional treatment-mean imputation (BM);

  4. Unconditional zero imputation (BM);

  5. Best-case analysis (BM);

  6. Worst-case analysis (BM);

Complete-case analysis removes cases with missing values and uses only cases with complete outcomes. Specifically, we discard and the sample size is reduced to , that is,

The complete-case analysis is easy to implement but generates unnecessary waste of information especially when the number of incomplete cases is substantial.

Unconditional control-mean imputation uses the mean in the observed users in the control group to impute missing values while unconditional treatment-mean imputation uses the mean in the observed users in the treatment group for imputation. That is,

where the set of indices is defined as and the set of indices is defined as . and is the number of sample sizes in the control group and in the treatment group, respectively. Unconditional zero imputation uses zero to impute missing values, that is,

These three imputation methods are different types of single value imputation approach, which can keep the full data size. But these imputation methods treat the missing values as fixed, distorting the distribution and ignoring the uncertainty in the missing values.

The best-case analysis imputes missing values in the treatment (control) group with the mean in the users in the treatment (control) group. In contrast to the best-case analysis, the worst-case analysis imputes missing values in the treatment (control) group with the mean in the users in the control (treatment) group. Here, we assume that the testing feature in nature has a positive impact, and thus the mean in the treatment group is expected to be greater than the mean in the control group. The best-case analysis and the worst-case analysis are expressed as

where () is the imputed missing value in the treatment (control) group, () is the observed value in the treatment (control) group.

To check the performance of the proposed method, we estimate the mean and variance in the control group, and compute lift in the mean between the treatment group and the control group, the standard error (SE) of the difference between the treatment and control group, coefficient of variation (CV) for the control group, zero rate (ZR) and p-value. The lift in the mean between the treatment group and the control group is described as

where and are the mean in the treatment group and the control group, respectively.

The SE is expressed as

where and are the standard errors for the treatment group and the control group, respectively.

In online experimentation, the faster we run experiments, the more economic benefits, and less operational costs are achieved. Given constant user traffic, running experiments faster means a smaller number of users required (wu2011experiments; deng2013improving). The CV is proportional to the number of users required for achieving a pre-determined statistical power of experiments. The CV is expressed as

The smaller the CV, the smaller the user size required to detect the difference at the specific statistical power, and thus the higher sensitivity.

The ZR is the ratio of the number of zero’s () in imputed out of total data size , described as

The ZR evaluates the proportion of visitors with the outcome as zero after the imputation method.

To illustrate the proposed method, this section uses a past experiment whose objective was to improve eBay’s item ranking search results based on one ranking algorithm. The experiment hypothesis is that integrating information about negative buyer experiences into the ranking algorithm will reduce the visibility of items with a high probability of negative buyer experiences in search results, resulting in lower product return rates and increased revenues. This experiment lasts three weeks. A portion of eligible eBay users are selected and randomized into three variants – two treatment groups and one control group. The number of participant users in each variant exceeds 10 million. One of the most important outcomes is related to purchases, denoted here as PR.

The outcome PR is incomplete due to its high missing rate. The PR is recorded when users made purchases during the experiment’s data collection period, but not when either of the following occurred: users did not make purchases, or the platform was unable to record the purchases before the end of the experiment’s data collection period. To impute PR and thus identify visitors and dropout buyers, we use these informative covariates, including the treatment assignment, the number of sessions, the number of sessions with searches, the number of sessions with qualified events highly related to purchases at eBay, and the user’s buying characteristics. The treatment assignment is pre-determined before running the experiment to assign users to the treatment group and the control group. The number of sessions corresponds to the number of sessions users have throughout the experiment. The number of sessions with searches is the number of sessions that contain at least one search activity. The number of sessions with qualified events is the number of sessions that include at least one qualified event activity. The buying characteristics of users are their historical purchasing patterns at eBay. These useful covariates are complete and do not have missing values. We impute the outcome PR using the proposed cluster-based imputation method. In the step of stratification, we divided the large-scale data set into smaller subsets based on two variables: the treatment assignment and user’s buying characteristic. When performing clustering within each strata, we use the number of sessions, the number of sessions with searches, and the number of sessions with qualified events.

In Table 2, we compare the performance between the proposed cluster-based imputation method and benchmark methods. The proposed method has a smaller mean in the control group and ZR than other methods except for the BM. The proposed imputation method identifies visitors and dropout buyers from missing values. That being said, the proposed cluster-based imputation method imputes zeros for visitors, which is a portion of users with missing outcomes, and positive values for dropout buyers. Compared to the BM, the proposed imputation method has a smaller size of zero and thus a larger mean in the control group. Compared to other mean-imputation methods that impute all missing values with a single value, the proposed imputation method has more zero’s and a smaller mean in the control group. The proposed method has a larger CV in the control group than all other methods, with the exception of BM. This is largely attributable to the change in the mean of the control group, as the pooled standard errors for all methods, with the exception of BM

, are quite close. The proposed method has the smallest lift, and all methods have a consistent direction of lift. Based on the p-value and the Type I error as 10%, the proposed method and BM

are statistically significant, indicating that there is sufficient evidence to reject the null hypothesis, whereas other methods are not statistically significant. This is expected because it is well known that single imputation methods tend to dilute mean differences, producing results that there is no difference between the control group and the treatment group. The proposed method has a larger variance in the control group and SE than other methods except for the BM

. The BM has a reduced sample size, resulting in the largest variance and SE for the control group. Unlike other methods, with the exception of the BM, the proposed method does not ignore variance among missing values, resulting in a greater variance.

Method CV Zero rate Lift (%) SE p-value
BM1 107035.21 1235.8 0.265 0.00 -0.37 0.33 0.17
BM2 20003.17 390.5 0.362 0.00 -0.16 0.06 0.28
BM3 20004.96 389.9 0.363 0.00 -0.17 0.06 0.28
BM4 20693.30 213.7 0.673 0.83 -0.29 0.06 0.31
BM5 20003.17 390.5 0.362 0.00 -0.29 0.06 0.05
BM6 20004.96 389.9 0.363 0.00 -0.03 0.06 0.82
Proposed 21661.66 246.3 0.598 0.80 -0.60 0.06 0.02
Table 2: Performance comparisons of benchmark methods in the ranking search experiment. Note that the values of ,

, CV, and SE are not real and masked with particular linear transformation to meet the disclosure requirement.

Figure 1 illustrates the increase in the mean of the control group across users’ buying segments for the proposed cluster-based imputation method and the zero-imputation method. Different user segments have different mean values, with the top two being the frequent buyer levels II and III. The proposed imputation method has larger mean values than the zero imputation method in nearly all user segments. The segments the frequent buyer levels II and III have considerably larger mean increases than the idle buyer levels. This suggests that the dropout buyers are more likely to occur in the frequent buyer levels II and III, while in the segments such as idle buyer levels, users with unrecorded outcomes are more likely to be visitors. This is consistent with the findings in Figure 2 regarding the allocation of the zero rate across user segments. Different user segments have varying degrees of zero rate. The zero rates for frequent buyer levels II and III are approximately 45%, whereas the zero rates for idle buyer levels II and III are above 90%. This is reasonable given that frequent buyer levels II and III are more likely to make purchases, resulting in low zero values for outcome PR. The high zero rate corresponds to the low mean value in Figure 1.

Figure 1: Comparison of mu across user segments between the proposed imputation method and the zero imputation method for the treatment group. The tick values in the vertical axis are omitted for the restriction of disclosure.
Figure 2: Comparison of zero rate across user segments between the proposed imputation method and the zero imputation method.

Figure 3 shows the distribution of CV across user segments for the proposed imputation method and the zero imputation method. For both methods, the CV values for the frequent buyer levels are less than half of those for the idle buyer levels. However, the CV of the proposed method is consistently lower than that of the zero imputation method across all user segments. The decrease in the CV indicates an improvement in sensitivity for the outcome PR. This improvement in sensitivity is largely attributable to the change in mean values.

Figure 3: Comparison of CV across user segments between the proposed imputation method and the zero imputation method for the treatment group. The tick values in the vertical axis are omitted for the restriction of disclosure.

Metrics provide strong evidence to support hypotheses in online experimentation and hence reduce debates in the decision-making process. This paper introduces the concept of dropout buyers and classifies users with incomplete metric values into two categories: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a cluster-based k-nearest neighbors-based imputation method. The proposed imputation method considers both the experiment-specific features and users’ activities along their shopping paths. The proposed method incorporates uncertainty among missing values in the outcome metrics using the k-nearest neighbors method. To facilitate efficient imputation in large-scale data sets in online experimentation, the proposed method employs a combination of stratification and clustering. The stratification approach divides the entire large-scale data set into small subsets to improve computation efficiency in the clustering step. The clustering approach identifies inherent structure patterns to improve the performance of the k-nearest neighbors method within each cluster.

It is worth remarking that in this work the k-nearest neighbors method used the average of responses in nearest neighbors. The weighted average of nearest neighbors has been proposed to suggest that different data points in the neighbor contribute differently to the decision based on their distances from the target point (hechenbichler2004weighted). That is, nearby data points, which are closer to the target in the neighbors, have higher influence on the decision than distant data points. Another direction for future research is to study the effects of dynamic numbers of nearest neighbors (ougiaroglou2007adaptive) in the proposed imputation framework. On the other hand, the proposed imputation method aims to impute missing values for each user with missing outcomes. It would be interesting to categorize users with missing outcomes into various hubs and investigate the imputation strategy for each hub of users altogether.

The Proposed Method

Competing Methods and Performance Measures

Case Study: Search Ranking Experiment

Discussion

References