1 Introduction
The growth of social networking websites such as Facebook, Google+, Instagram etc. along with the ubiquitous mobile devices has enabled people to generate multimedia content at an exponentially increasing rate. Due to the easytouse photocapturing process of mobile devices, people are sharing close to two billion photos per day on the social networking sites ^{1}^{1}1http://www.kpcb.com/internettrends. People want their photos to be visuallyattractive which has given rise to automated, onetouch enhancement tools. However, most of these tools are predefined image filters which lack the ability of doing contentadaptive or personalized enhancement. This has fueled the development of machinelearning based image enhancement algorithms.
Many of the existing machinelearned image enhancement approaches first learn a model to predict a score quantifying the aesthetics of an image. Then given a new lowquality image^{2}^{2}2We call the images before enhancement as lowquality and those after enhancement as highquality in the rest of this article. The process of enhancing a new image is called “the testing stage”., a widelyfollowed strategy to generate its enhanced version is as follows:

Generate a large number of candidate enhancement parameters^{3}^{3}3The brightness, saturation and contrast are referred to as “parameters” of an image in this article. by densely sampling the entire range of image parameters. Computational complexity may be reduced by applying heuristic criteria such as, densely sampling only near the parameter space of most similar training images.

Apply these candidate parameters to the original lowquality image to create a set of candidate images.

Perform feature extraction on every candidate image and then compute its aesthetic score by using the learned model.

Present the highestscoring image to the user.
There are two obvious drawbacks for the above strategy. First, generating and applying a large number of candidate parameters to create candidate images may be computationally prohibitive even for lowdimensional parameters. For example, a space of three parameters where each parameter produces combinations. Second, even if creating candidate images is efficient, extracting features from them is always computationally intensive and is the bottleneck. Also, such heuristic methods need constant interaction with the training database (which might be stored on a server) that makes the parameter prediction suboptimal. All these factors contribute to making the testing stage inefficient.
Our approach assumes that a model quantifying image aesthetics has already been learned and instead focuses on finding a structured approach to enhancement parameter prediction. During training, we learn the interrelationship between the lowquality images, its features, its parameters and the highquality enhancement parameters. During the testing stage, we only have access to a new lowquality image, its features, parameters and the learned model and we have to predict the enhancement parameters. Using these enhancement parameters, we can generate the candidate images and select the best one using the learned model. The stringent requirement of not accessing the training images arises from realworld requirements. For example, to enhance a single image, it would be inefficient to establish a connection with the training database, generate hundreds of candidate images, perform feature extraction on them and then find the best image.
The search space spanned by the parameters is huge. However, the enhancement parameters are not randomly scattered. Instead they depend on the parameters and features of the original lowquality image. Thus we hypothesize that the enhancement parameters should have a lowdimensional structure in another latent space. We employ an MFbased approach because it allows us express the enhancement parameters in terms of three latent variables, which model the interaction across: 1. the lowquality images 2. their corresponding enhancement parameters 3. the lowquality parameters. The latent factors are learned during inference by Gibbs sampling. Additionally, we need to incorporate the lowquality image features since the enhancement parameters also depend on the color composition of the image, which can be characterized by the features. The feature incorporation in this framework is achieved by representing the latent variable which models the interaction across these images as a linear combination of their features, by solving a convex norm problem. We review the related work on MF as well as image enhancement in the following section.
2 Related Work
Development of machinelearned image enhancement systems has recently been an active research area of immense practical significance. Various approaches have been put forward for this task. We review those works which improve the visual appearance of an image using automated techniques. To encourage research in this field, a database named MITAdobe FiveK containing corresponding low and highquality images was proposed in [4]. The authors also proposed an algorithm to solve the problem of global tonal adjustment. The tone adjustment problem only manipulates the luminance channel, where we manipulate saturation, brightness and contrast of an image.
Contentbased enhancement approaches have been developed in the past which try to improve a particular image region [2, 9]. These approaches require segmented regions which are to be enhanced. This itself may prove to be difficult. Approaches which work on pixels have also been developed using local scene descriptors. Firstly, similar images from the training set are retrieved. Then for each pixel in the input, similar pixels were retrieved from the training set, which were then used to improve the input pixel. Finally, Gaussian random fields maintain the spatial smoothness in the enhanced image. This approach does not consider the global information provided by the image and hence the enhancements may not be visuallyappealing when viewed globally. In [8], a small number of image enhancements were collected from the users which were then used along with the additional training data.
Two recent works involving training a ranking model from low and highquality images are presented in [5, 25]. The authors of [25] create a dataset of corresponding low and highquality image pairs along with a record of the intermediate enhancement steps. A ranking model trained on this type of data can quantify the aesthetics of an image. In [5], noncorresponding low and highquality image pairs extracted from the Web are used to train a ranking model. Both of these approaches use NNsearch during the testing stage to create candidate images. After extracting features and ranking them, the best image is presented to the user.
The task of enhancement parameter prediction could be related to the attribute prediction [17, 18, 11, 7]. However, the goal of the work on attribute prediction has been to predict relative strength of an attribute in the data sample (or image). We are not aware of any work which predicts parameters of an enhanced version of a lowquality image given only the parameters and features of that image. Since our approach is based on MF principles, we review the related recent work on MF.
MF [19, 15, 20, 10, 24] is extensively used in recommender systems [12, 1, 13, 23, 14, 22, 21]. These systems predict the rating of an item for a user given his/her existing ratings for other items. For example, in Netflix problem, the task is to predict favorite movies based on user’s existing ratings. MFbased solutions exploit following two key properties of such useritem rating matrix data. First, the preferred items by a user have some similarity to the other items preferred by that user (or by other similar users, if we have sufficient knowledge to build a similarity list of users). Second, though this matrix is very highdimensional, the patterns in that that matrix are structured and hence they must lie on a lowdimensional manifold. For example, there are movies in Netflix data and ratings range from . Thus, there are rating combinations possible per user and there are users. Therefore, the number of actual variations in the rating matrix should be a lot smaller than the number of all possible rating combinations. These variations could be modeled by latent variables lying near a lowdimensional manifold. This principle is formalized in [15] with probabilistic matrix factorization (PMF). It hypothesizes that the rating matrix can be decomposed into two latent matrices corresponding to user and movies. Their dot product should give the userratings. This works fairly well on a largescale dataset such as Netflix. However, a lot of parameters have to be tuned. This requirement is alleviated in [20] by developing a Bayesian approach to MF (BPMF). BPMF has been extended for temporal data (BPTF) in [24]
. MF is used in other domains such as computer vision to predict feature vectors of another viewpoint of a person given a feature for one viewpoint
[6]. We adopt and modify BPTF since it allows us to model joint interaction across lowquality images, corresponding enhancement parameters and the lowquality parameters. In the next section, we detail our problem formulation and proposed approach.3 Problem Formulation and Proposed Approach
We have a training set consisting of images ^{4}^{4}4We use bold letters to denote matrices. Nonbold letters denote scalars/vectors which will either be clear from the context or will be mentioned. and denote row, column, transpose, entry at row and column of a matrix and norm of matrix respectively.. Parameters of all images are represented as where . Each image has enhanced versions and each version has the same size as that of its corresponding lowquality image. All versions corresponding to the image are represented as . All versions are of higher quality as compared to its corresponding image. Parameters of all versions of the image (also called as candidate parameters) are represented as , where . Features of all lowquality images are represented as where . In practice, we observe that . Our goal is to be able to predict the candidate parameters for all the versions of the image by only using the information provided by and . To the best of our knowledge, this is a novel problem of real significance that has not been addressed in the literature. We now explain our proposed approach.
As mentioned before, our task is to predict the candidate parameters for all the enhanced versions of a lowquality image with the help of its parameters and features. The values for all the parameters corresponding to images and their versions (total can be stored in threedimensional matrix . We need to predict or in turn just . denotes the parameter value ( of the lowquality image and is the parameter value of version of the image. Given a new lowquality image, we only need to predict .
During training, we can compute from available and . Following MF principles, we express as an inner product of three latent factors, and [20, 24]. is the latent factor dimension. These latent factors should presumably model the underlying lowdimensional subspace corresponding to the lowquality images, its enhanced versions and its parameters. This can be formulated as:
(1) 
where denotes the feature of the column of . Presumably, as we increase , the approximation error should decrease (or stay constant) if the prior parameters for latent factors and are chosen correctly. Following [20]
, we choose normal distribution (with precision
) for: 1. the conditional distribution and 2. for prior distributions  and , where , , . and are hyperparameters, and andare the multivariate precision matrix and the mean respectively. Since the Wishart distribution is a conjugate prior for multivariate normal distribution (with precision matrix), we put GaussianWishart priors on all hyperparameters
^{5}^{5}5For details, see supplementary material on author’s website.. We could find the latent factors and by doing inference through Gibbs sampling. It will sample each latent variable from its distribution, conditional on the values of other variables. The predictive distribution for can be found by using MonteCarlo approximation (explained later).However, it is important to note the following major differences in our problem when compared with the previous work on MF [20, 24]. In product or movie rating prediction problems, an average (nonpersonalized) recommendation may be provided to a user who has not provided any preferences (not necessarily constant for all users). In our case, each image may require a different kind of parameter adjustment to create its enhanced version and thus no “average” adjustment exists. As explained before, the adjustment should depend on the image’s features, which characterize that image (e.g. bright vs. dull, muted vs. vibrant). In our problem, it is particularly difficult to get a good generalizing performance on the testing set as we shall see later. The loss in performance of existing approaches on the testing set can be attributed to the different requirements for parameter adjustments for each image. Thus it becomes necessary to include the information obtained from image features into the formulation. We show that simply concatenating the parameters and features and applying MF techniques presented in [20, 24] does not provide good performance, possibly because they lie in different regions of the feature space.
To overcome this problem, we observe that the conditional distribution of each factorizes with respect to the individual samples. We propose to express as a linear function of
by using a convex optimization scheme. We then integrate it into the inference algorithm to find out the latent factors. The linear transformation can be expressed as,
(2) 
where and . Note that to carry out this decomposition, we have to set . This is not a severe limitation since is usually large () and as we have mentioned before, increasing should decrease the approximation error at the cost of increased computation. Henceforth we assume that our feature extraction process generates . Also, note that large does not mean that the latent space is no longer lowdimensional, because is still smaller as compared to all the possible combinations of parameters (e.g. ).
We propose an iterative convex optimization process to determine coefficients and of Equation 2. We propose the following objective function to determine them: q
(3) 
The objective function tries to reconstruct using and while controlling the complexity of coefficients. Let’s concentrate on the structure of (by neglecting the effect of momentarily). The columns of act as coefficients for . Ideally, we would want the elements of to be determined by a sparse set of features, which implies sparsity in the columns of . To this end, we impose norm on , which gives us a blockrow structure for .
Let us consider the structure of along with . Equation 2 shows that different columns of depend on different image features . Also, we expect that a different set of columns of will get activated (take on large values) for different . We add an offset for regularization. Thus the offset introduced by remains constant across all the images but changes for each . Making to be a row vector also forces to play a major role in Equation 3. This in turn increases the dependence of on . If we were to define as the same size of (which would mean different offsets for each image as well as its features), it would pose two potential disadvantages. Firstly, optimal and could be (trivially) obtained by just setting each entry of to a very small value and letting a column of (which makes redundant). Secondly, while testing for a new image, we would have to devise a strategy to determine the suitable value for . For example, we could take the column of that corresponds to the nearest training image. This adds unnecessary complexity and reduces generalization. By making a row vector, we consider that it may be possible to arrive to the space of enhancement parameters by linearly transforming the lowquality image features with a constant offset. In other words, we want to transform the features into a region in the latent space where all the other highquality images lie and provides an offset to avoid overfitting. This is a joint norm problem which can be solved efficiently by reformulating it as convex. We thus reformulate Equation 3 as follows, inspired by [16]:
(4) 
The Norm of a matrix is defined as, . Also, for a row vector , we have . Thus Equation 4 can be further written as:
(5) 
where and is a column vector of ones . Now, put . Thus Equation 5 becomes:
(6) 
Equation 6 is now in the form of: and thus convex. It can be iteratively solved by an efficient algorithm mentioned in [16]. We set and . Once we have expressed as a function of , we use Gibbs Sampling to determine the latent factors and [20]. As mentioned before, the predictive distribution for a new parameter value is given by a multidimensional integral as:
(7)  
We resort to numerical approximation techniques to solve the above integral. To sample from the posterior, we use Markov Chain Monte Carlo (MCMC) sampling. We use the Gibbs sampling as our MCMC algorithm. We can approximate the integral by,
(8) 
Here we draw samples and the value of is set by observing the validation error. The sampling from and
is simple since we use conjugate priors for the hyperparameters. Also, a random variable can be sampled in parallel while fixing others which reduces the computational complexity. Algorithm
shows how to iteratively sample and obtain and . Note that it is required in the algorithm to reconstruct at every iteration since there will always be a small reconstruction error . The error occurs because we force to be a row vector, which makes the exact recovery of difficult. The reconstructed error causes adjustment of and . Once we obtain the four latent factors, our task is to predict the parameter values for enhanced versions having parameters each. Suppose is the feature vector of a new image, then the parameter values can be simply obtained by computing, and . If the parameter value predictions lie beyond a certain range then a thresholding scheme can be used based on the prior knowledge. For example, to constrain the predictions between , a logistic function may be used.4 Experiments
We conduct two experiments to show the effectiveness of our approach. We did the first one on a synthetic data and compared it with: 1. BPMF 2. our own discrete version of BPTF, called DBPTF. 3. multivariate linear regression (MLR) 4. twin Gaussian processes (TGP)
[3] 5. Weighted NN regression (WKNN). For DBPTF, we make minor modifications in the original BPTF approach [24] by removing the temporal constraints on their temporal variable, since there are no temporal constraints in our case. The inference for their temporal variable is then done in the exactly same manner as the other nontemporal variables. This gave us a marginal boost in the performance. For MLR, We use a standard multivariate regression by maximum likelihood estimation method. Specifically, we use MATLAB’s mvregress command. TGP is a generic structured prediction method. It accounts correlation between both input and output resulting in improved performance as compared to MLR or WKNN. The WKNN approach predicts the test sample as a weighted combination of the nearest inputs. The first two algorithms do not allow features inclusion. For MLR, TGP and WKNN, we concatenate and , and use it to predict . Even for our approach, we concatenate and sample feature to form . The intuition behind this concatenation is that the enhancement parameters should be a function of input parameters as well along with the features. We did observe performance boost after concatenating the features and parameters.The second experiment demonstrates the usefulness of this approach in a realworld setting where we have to predict paramters of the enhanced versions of an image (then generate those versions by applying predicted parameters to the input lowquality image) without using any information about the versions. We compare our approach with the competing algorithms in addition to NNsearch as it is also used in [26, 5]. We also analyzed the effect of in our solution by: removing i.e. .
4.1 Data set description and experiment protocol
The synthetic data is carefully constructed by keeping the following task in mind. We are given a training set consisting of: 1. ; 2. ; and 3. only parameters of versions for each input sample  . Our aim is to predict parameters for a set of versions given a new and . In realworld problems, and are interdependent. The parameters of versions are dependent on both . Hence we construct the synthetic data as follows.
Firstly, we generate a set of D input parameters 
 drawn from a uniform distribution
. Then we generate a D feature set , where each element of is related to all by a nonlinear function. For example, and are random numbers. The parameters of enhanced versions, , are also nonlinearly related to and . For example, . The contribution of is decided by . We perform 3fold crossvalidation. We predict the values of in the test set (disjoint from training) using corresponding and . RMSE is computed between the predicted and actual .The MITAdobe FiveK dataset contains 5000 highquality photographs taken with SLR cameras. Each photo is then enhanced by five experts to produce 5 enhanced versions. We extract average saturation, brightness and contrast for every image, which are parameters . We also extract D color histogram with 26 bins for hue, 7 bins each for saturation and value. We also calculate localized features of D each for contrast, brightness and saturation. Finally, we append average saturation, brightness and contrast of the input lowquality image, which are its parameters. Thus we get a D representation for every image . We train using 4000 images and use 500 images each for validation and testing. We predict parameters for 5 versions in a matrix for each image in the testing set. An entry denotes the value for parameter of enhanced version. To enable comparison with the expertenhanced images of the dataset, we also compute parameters for 5 enhanced versions for each image, which we treat as groundtruth. We evaluate this experiment in two ways. Firstly, we calculate RMSE between the parameters of 5 expertenhanced photos and the parameters of the predicted versions using five aforementioned algorithms. Secondly, we conduct a subjective test under standard test settings (constant lighting, position, distance from the screen). In this case, we compare our approach with the popular NNsearchbased approach. It first finds the nearest original image in the training set to the testing image   and then applies the same parameter transformation to to generate version. In our approach, we predict the parameters for enhanced versions using the proposed formulation. We threshold the parameter values as:
(9)  
where and are multipliers for the parameter. In our case, the multipliers for saturation, brightness and contrast are: . As mentioned before, the clipping scheme in our formulation should be set using prior knowledge. Here, we know that the enhanced images usually have a larger increase (as compared to decrease) associated with their parameters. Also, changing contrast by a very small amount affects the image greatly.
The predicted parameters are applied to the input image to obtain its enhanced versions. The procedure is the same for both the approaches and is as follows. First we change contrast till the difference between the updated and the predicted contrast is marginal. We update contrast first since changing it updates both brightness and saturation. We then update brightness and saturation till they come significantly closer to their corresponding predicted values. This gives us 5 versions for both approaches. To allow comparisons within a reasonable amount of time, we use a pretrained ranking weight vector (from [5]) to select the best image of our approach (improposed) and NNapproach (imNN). For the subjective test, people are told to compare improposed with the 5 enhanced versions of NNapproach as well as with imNN. Thus for every input image, people perform 6 comparisons. The image order was randomized. We conducted the test with 11 people and 35 input images. Thus every person compared pairs of images. They were told to choose a visuallyappealing image. The third option of simultaneously preferring both images was also provided. This option has no effect on cumulative votes.
4.2 Results
The parameters for the synthetic data were more accurately predicted by our approach than BPMF, DBPTF, MLR, TGP and WKNN. It is worth noting that though the training error continues to decrease for our approach, BPMF and DBPTF, the testing error starts increasing after only 5 and 8 iterations for BPMF and DBPTF, respectively. However, testing error in our approach decreases rapidly for 4 iterations and then it decreases very slowly for the next 12, as shown in Fig. 1. The RMSE on test set for BPMF, DBPTF, MLR, TGP, WKNN and the proposed approach is and . The numbers show that our approach is able to effectively use the additional information provided by features and the interaction between and all versions to provide better prediction. On the other hand, BPMF and DBPTF start overfitting quickly due to lack of sample feature information while MLR and WKNN fail to model the complex interaction between variables. TGP performs better because of its ability to capture correlations between input and output. However, TGP still treats each version independently and thus its performance still falls short of our approach.
In the second experiment, the RMSE for BPMF, DBPTF, MLR, TGP, WKNN and our approach is and respectively. The testing error starts increasing after only 3 and 5 iterations for BPMF and DBPTF, respectively. It is important to note that we do not use the clipping scheme mentioned in Equation 9 in order to do a fair comparison of RMSEs between all the five approaches and the proposed appraoch. For the subjective evaluation, Fig. 1 shows cumulative votes obtained for ours and the NNbased approach for comparison between 5 images chosen by NN and the best image chosen by our approach. Fig. 1 also shows votes obtained for the best images chosen by both approaches. Fig. 8 shows two input images enhanced by both the approaches. The top row of Fig. 8 shows that
NN reduces the saturation while increasing the brightness. Our approach balances both of them to obtain a more appealing image. In the bottom row, however, both approaches fail to produce aesthetic images as images become too bright. It is probably due to the portion of the sky in the input image. For both the images, most people prefer images enhanced by our approach. Computationally, our approach is superior than
NN. Complexity of our approach is independent of dataset size at testing time whereas NN searches the entire dataset for the closet image and then applies its parameters.We reconstructed and observed performance drop as it overfits. We get RMSE of and on enhancement and simulation data, respectively. We believe the realworld enhancement data has correlations naturally embedded in it unlike in synthetic data. Thus the performance drop is drastic in case of enhancement since the problem of recovering only from and is illposed.
We also analyzed the effect of varying and . Since our approach uses Bayesian probabilistic inference, small variations in and do not significantly affect the performance. Table 1 lists the various parameter settings and its effect on the performance of the second experiment (i.e. image enhancement):
Parameter setting  RMSE (lower the better) 

0.3162  
0.0962  
0.0907  
0.0930  
0.0872  
0.0820  
0.0821  
0.0820 
5 Conclusion
In this paper, we introduced a novel problem of predicting parameters of enhanced versions for a lowquality image by using its parameters and features. We developed an MFinspired approach to solve this problem. We showed that by modeling the interactions across lowquality images, its parameters and its versions, we can outperform five stateofart models in structured prediction and MF. We proposed inclusion of feature information into our formulation through a convex norm minimization, which works in an iterative fashion and is efficient. Thus our approach utilizes information which helps characterize input image. This leads to better generalization and prediction performance. Since other approaches do not model interdependence between image features and parameters of their corresponding enhanced versions, they start overfitting quickly and produce an inferior prediction performance on the test set. Experiments on synthetic and real data demonstrated superiority of our approach over other stateofart methods.
Acknowledgement: The work was supported in part by an ARO grant (#W911NF1410371) and an ONR grant (#N000141512344). Any opinions expressed in this material are those of the authors and do not necessarily reflect the views of ARO or ONR.
References
 [1] L. Baltrunas, B. Ludwig, and F. Ricci. Matrix factorization techniques for context aware recommendation. In Proceedings of the fifth ACM conference on Recommender systems, pages 301–304. ACM, 2011.
 [2] F. Berthouzoz, W. Li, M. Dontcheva, and M. Agrawala. A framework for contentadaptive photo manipulation macros: Application to face, landscape, and global manipulations. ACM Trans. Graph., 30(5):120, 2011.
 [3] L. Bo and C. Sminchisescu. Twin gaussian processes for structured prediction. International Journal of Computer Vision, 87(12):28–52, 2010.

[4]
V. Bychkovsky, S. Paris, E. Chan, and F. Durand.
Learning photographic global tonal adjustment with a database of
input/output image pairs.
In
Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on
, pages 97–104. IEEE, 2011.  [5] P. S. Chandakkar, Q. Tian, and B. Li. Relative learning from web images for contentadaptive enhancement. In Multimedia and Expo (ICME), 2015 IEEE International Conference on, pages 1–6. IEEE, 2015.
 [6] C.Y. Chen and K. Grauman. Inferring unseen views of people. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 2011–2018. IEEE, 2014.
 [7] L. Chen, Q. Zhang, and B. Li. Predicting multiple attributes via relative multitask learning. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 1027–1034. IEEE, 2014.
 [8] S. B. Kang, A. Kapoor, and D. Lischinski. Personalization of image enhancement. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 1799–1806. IEEE, 2010.
 [9] L. Kaufman, D. Lischinski, and M. Werman. Contentaware automatic photo enhancement. In Computer Graphics Forum, volume 31, pages 2528–2540. Wiley Online Library, 2012.
 [10] N. D. Lawrence and R. Urtasun. Nonlinear matrix factorization with gaussian processes. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 601–608. ACM, 2009.
 [11] S. Li, S. Shan, and X. Chen. Relative forest for attribute prediction. In Computer Vision–ACCV 2012, pages 316–327. Springer, 2013.
 [12] H. Ma, H. Yang, M. R. Lyu, and I. King. Sorec: social recommendation using probabilistic matrix factorization. In Proceedings of the 17th ACM conference on Information and knowledge management, pages 931–940. ACM, 2008.
 [13] H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender systems with social regularization. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 287–296. ACM, 2011.
 [14] B. Marlin, R. S. Zemel, S. Roweis, and M. Slaney. Collaborative filtering and the missing at random assumption. arXiv preprint arXiv:1206.5267, 2012.
 [15] A. Mnih and R. Salakhutdinov. Probabilistic matrix factorization. In Advances in neural information processing systems, pages 1257–1264, 2007.

[16]
F. Nie, H. Huang, X. Cai, and C. H. Ding.
Efficient and robust feature selection via joint
norms minimization. In J. Lafferty, C. Williams, J. ShaweTaylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1813–1821. Curran Associates, Inc., 2010.  [17] D. Parikh and K. Grauman. Relative attributes. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 503–510. IEEE, 2011.
 [18] D. Parikh, A. Kovashka, A. Parkash, and K. Grauman. Relative attributes for enhanced humanmachine communication. In AAAI, 2012.
 [19] J. D. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the 22nd international conference on Machine learning, pages 713–719. ACM, 2005.
 [20] R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using markov chain monte carlo. In Proceedings of the 25th international conference on Machine learning, pages 880–887. ACM, 2008.
 [21] Y. Shi, M. Larson, and A. Hanjalic. Collaborative filtering beyond the useritem matrix: A survey of the state of the art and future challenges. ACM Computing Surveys (CSUR), 47(1):3, 2014.
 [22] Q. Song, J. Cheng, and H. Lu. Incremental matrix factorization via feature space relearning for recommender system. In Proceedings of the 9th ACM Conference on Recommender Systems, pages 277–280. ACM, 2015.

[23]
S. Wang, J. Tang, Y. Wang, and H. Liu.
Exploring implicit hierarchical structures for recommender systems.
In
International Joint Conference on Artificial Intelligence (IJCAI)
. IJCAI, 2015. 
[24]
L. Xiong, X. Chen, T.K. Huang, J. G. Schneider, and J. G. Carbonell.
Temporal collaborative filtering with bayesian probabilistic tensor factorization.
In SDM, volume 10, pages 211–222. SIAM, 2010.  [25] J. Yan, S. Lin, S. B. Kang, and X. Tang. A learningtorank approach for image color enhancement. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 2987–2994. IEEE, 2014.
 [26] J. Yan, S. Lin, S. B. Kang, and X. Tang. A learningtorank approach for image color enhancement. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 2987–2994. IEEE, 2014.
6 Supplementary
The notation style is the same as that of the main paper. A different section is created to address each footnote.
7 Prior Distributions
The prior distributions on and are chosen as normal distributions. We also consider a normal distribution to model the randomness in the attribute difference values . The details are as follows:
(10)  
where is precision, is a identity matrix, is a
dimensional multivariate Gaussian distribution with
dimensional mean vector and a covariance matrix . For both simulation and enhancement experiment, we use , .We now choose prior distributions for the hyperpriors.
(11)  
Here, is the Wishart distribution of a random matrix with degrees of freedom and a scale matrix . The parameters in the hyperpriors: and are treated as constants during training. They are set using prior knowledge of the application. For both experiments, we use: . The Bayesian formulation of the factorization adjusts the parameters within a reasonable range.
8 Sampling of Hyperparameters
Conditional distributions in Gibbs Sampling: The joint posterior distribution can be factorized as:
(12)  
We now derive the desired conditional distribution by substituting all the model components previously described.
Hyperparameters: We use the conjugate prior for the parameter value precision , we have that the conditional distribution of given and follows the Wishart distribution:
(13) 
where if an attribute value is present (not missing), otherwise . Also, . For , we can integrate out all the random variables given in Equation 12 except and obtain the GaussianWishart distribution:
(14)  
similarly, is conditionally independent of all other parameters given , and its conditional distribution has the form:
(15) 
similarly, is conditionally independent of all other parameters given , and its conditional distribution has the form:
(16) 
Model Parameters: We first consider the latent example (data sample) features . Since its columns affect the example features independently, its conditional distribution factorizes w.r.t. individual .
(17) 
Then for each latent example feature vector ,
(18) 
where , which represents elementwise product between and .
Similarly, for each latent modified version feature , we have:
(19) 
where
For each latent attribute feature , we have:
(20) 
where
Comments
There are no comments yet.