1 Introduction
In fact, there exist three genres of intelligence architectures: logics (e.g. Random Forest (Zhou & Feng, 2017), A Searching (Munos, 2012)), neurons (e.g. CNN (He et al., 2016), LSTM(Xiao, 2017a)) and probabilities (e.g. Naive Bayes (Koller & Friedman, 2009), HMM (Murphy, 2012)). The proposed neural graph (Xiao, 2017b) unifies the methodology of logics and neurons, which provides a more powerful form for intelligence systems. However, from the perspective of uncertainty, neural graph only characterizes the system in a deterministic way. In order to model intelligence in both deterministic and stochastic manner, we shall unify the neural and probabilistic graph in this paper as intelligence graph, or iGraph. From the perspective of practice, the system designer could freely employ the models from both neural and probabilistic graph (Xiao, 2017b; Jordan, 2004) in one architecture at the same time.
Specifically, the neural and probabilistic parts corporate in the framework of forwardbackward propagation, (Rumelhart et al., 1988; Xiao, 2017a). In iGraph, we indicate the interface between neurons and probabilistic variables, where input interface links neurons to stochastic parts and output interface plays the otherwise corresponding role. As shown in Figure 1, we leverage the topic model (Murphy, 2012)
(notice the difference between pLSI) to classify documents, where the topic distribution of words
stem from the hidden representations of LSTM, rather than the directly learned distribution vectors. Then, the softmax layer is employed for the document classification with the corresponding topic distribution
. Mathematically, we could derive the system as below:(1)  
(2) 
where is the parameter of softmax and is the conventional network of LSTM.
In essence, the gradients could be automatically derived, only if we could formulate the probabilistic part. Simply, in the forward pass, we treat the input/output interface as input/output probabilistic distribution, while the sumproduct algorithm formulates this process for calculation, (Kschischang et al., 2001). For the example of Figure 1, the input/output distribution is /. To start, we work out the joint probabilistic distribution as , and then the formulation of sumproduct algorithm is presented in Equation (1). In the backward pass, the gradient propagates though Equation (1) to LSTM, in a conventional manner. Notably, given the specific iGraph, all of the deductions could be performed automatically.
With this principle of iGraph, we tackle the task of recommendation in Figure 2, as an extension of SAR (Xiao & Meng, 2017). SAR is a probabilistic model, which applies semantic principle (Xiao, 2016), to represent users/items semantically, and then treats the recommendation as a process of semantic matching. There exist two disadvantages for SAR, listed as (1.) The characterization of category distribution that a distribution vector is too simplified to achieve better performance. (2.) There is no mechanism for considering recommendation diversity.
Regarding the first disadvantage, we employ a neural network to generate the category distributions from user and item embedding representations. In this way, the category distributions are modeled precisely. Regarding the second disadvantage, we respectively process the situations where the predicted rating is high or medium. Actually, a warmovie fan would not expect all the recommended films are warrelated, because
some highquality movies would not make a perfect semantic match, but still could be highly rated. In the final stage of our iGraph, for the diversity of recommendation, we check out the case of highquality movies and make an appropriate recommendation based on the quality and popularity.We conduct the experiments of rating prediction on Movielens dataset to verify our model. Experimental results demonstrate the effectiveness and efficiency of our model, because our model beats all the stateoftheart baselines. We also vary the hyperparameters to testify our model, and conclude that our model is robust for hyperparameter settings.
Contributions. We list two contributions: (1.) Intelligence graph (iGraph) could represent all the combinations and iterations of almost every intelligence method, which incurs a complete representation theory of intelligence. (2.) We design a novel graph for recommendation based on SAR and tackle two issues that the oversimplified category distribution and recommendation diversity. Besides, we achieve the stateoftheart performance in the task of rating prediction.
2 Related Work
We have surveyed the relevant papers and roughly classified the existing recommendation methods into five categories: matrix factorization, neighborhood based method, regression based method, social information integration method and semantic analysis. Notably, most methods are based on rating matrix completion, which is a conventional setting for recommendation, (Xiao & Meng, 2017).
Matrix Factorization is a classic recommendation methodology. First, this paradigm factorizes the rating matrix to get the user/itemspecific latent matrices ,. Then, the method multiplies
to estimate the unobservable ratings, where
is the complete rating matrix. Because this branch hypothesizes different assumptions on latent matrices and , there also exist four primary subcategories according to the applied hypotheses. (1.) Basic matrix factorization generally constrains latent factors as positive/nonnegative, such as NMF (Wang & Zhang, 2013), SVP (Meka et al., 2009), MMMF (Rennie & Srebro, 2005), PMF (Mnih & Salakhutdinov, 2012). (2.) Matrix factorization jointed with neighborhoodbased modeling, such as CISMF (Guo et al., 2015). (3.) Matrix factorization under various rank assumptions explores the effects of representation and generalization ability for the matrix rank to boost the performance, such as LLORMA (Lee et al., 2016)(Ko et al., 2015), R1MP (Wang et al., 2014b), SoftImpute (Rahul Mazumder, 2010), (Ganti et al., 2015), (Zhang et al., 2013), (KirÃ¡ly et al., 2012). (4.) Matrix factorization with discrete outputs treats each item of the rating matrix as discrete value to avoid noise and obtains more interpretations, such as ODMC (Huo et al., 2016).Neighborhood Based Method is one of the most seminal approaches, assuming that the similar items/users trigger similar rating preference. There exist three main variants including itembased, userbased and global similarity, surveyed in (Guo et al., 2015) and (Ricci et al., 2011).
Regression Based Method formulates recommendation or matrix completion as a regression problem, such as graph regression method GRALS (Cai et al., 2011), blind regression (Song, 2016), Riemannian manifold based regression (Vandereycken, 2013), and others (Davenport et al., 2014).
Social Information Integration applies social information to strengthen the recommendation such as relationship between users, personalized profiles or movies’ attributes. There list some latest researches: SR (Ma, 2013), PRMF (Liu et al., 2016), geospecific personalization (Liu et al., 2014), social network based methods (Deng et al., 2014) and other social context integration methods (Wang et al., 2014a).
Semantic Analysis takes the advantage of semantic principle or multiview clustering methodology for recommendation (Xiao & Meng, 2017). Semantic principle conjectures that clusters and semantics are equivalently corresponded, (Xiao, 2016). Simply, SAR (Xiao & Meng, 2017) clusters the users/items in different views, where each view corresponds to a specific semantic style. Then, summarizing the cluster information in each semantic view, we obtain the user/itemspecific semantic representation. Last, SAR performs a process of semantic matching to discriminate the highly rated items for users (better match triggers higher rating). This method is based on probabilistic graph, thus lacking of nonlinear function and complex neural network structures leads to unsatisfactory performance. Also, there exist the recommendation diversity issue as previously discussed.
3 Methodology
Our iGraph is illustrated in Figure 2. In this paper, we design a recommendation model with semantic principle. First, the probabilistic distributions of categories are generated from the embedding representations of users/items, in the manner of neurons. Second, the probabilistic graph infers the distributions of features, in the manner of probabilities. Last, for the recommendation diversity, we perform an expectation computation then conduct a logic judgment, in the manner of logics.
3.1 Architecture
In the sequel, we discuss our graph in corresponding five components: embedding layer, user/itemspecific network, semantic component, rating generation and logic judgment.
Embedding Layer. It is necessary to represent users/items in a latent manner, because the input of recommendation is oversimple. Actually, most recommendation methods take the idea of embedding. For the example of PMF, the row/column of factor matrices represent the user/item in a manner of probabilistic distribution. Specifically, regarding SAR, the category distribution of / corresponds to the functionality of embedding representation. In this paper, we take the dimensional real vector as our embedding, and then we concatenate the corresponding user and item embedding vector as the embedding of this entry .
User/ItemSpecific Network. SAR simply supposes the user/item category distribution is only related to the corresponding user/item. However, we conjecture user category distribution should vary slightly with the different items, and similar case for item. By the flexibility provided by iGraph, we employ two specific networks to transform the embedding of rating matrix entry into the corresponding category distributions, which is the input interface of probabilistic part. Notably, we customize the network as multiple layer perceptions in the hyperparameter setting.
(3) 
Semantic Component. This component is a direct copy of SAR, which is a twolevel hierarchical generation process. Simply, the model generates different features in the firstlevel process, while the user/item generates the categories for each feature in the secondlevel process, correspondingly. Finally, the categories in each feature generate the preference , where is the range of rating. There introduce three factors: , and , two of which are described in the previous paragraph. Regarding the remaining distribution , we should consider the the effect of semantic matching as:
(6) 
where is tabular model parameters which can be tuned in the learning process and is the hyperparameter.
Though the deduction of probabilistic part is automatically performed by sumproduct rule, we still present the corresponding probabilistic form for clarity, where twolevel mixture indicates the process of multiview clustering methodology, according to semantic principle.
Rating Generation. Regarding the task of prediction, according to SAR, rating is estimated as the expectation of softmax distribution, as formulated in (7).
(7) 
where corresponds to the rating range, and is the output interface as , which has entries.
Logic Judgment. Argued in the “Introduction”, we should consider the highquality movies for recommendation diversity. It is necessary to judge the match degree, because we only perform diversity recommendation for a special range of predicted rating, such as . In fact, the highly rated items need no extra processing, thus we limit the upper bound as , while extremely semantically unmatched items should not be considered as recommended, even if it is popular or highquality, hence we limit the lower bound as . Notice that are two distinguishable hyperparameters.
Generally, we complement the rating in the range with a neural network as , where is the output of the corresponding network and the input is another item embedding and feature distribution . To summarize, we present this process in Algorithm 1. Notice that the network is customized as a multiple layer perception in the hyperparameter setting.
3.2 From the Perspective of Interpretability
Commonly agreed by our community, neural networks or neural parts are blackbox. The critical point of neural style is strong datafitting ability, while the flaw is lacking of interpretability.
However, a better interpretability takes at least three advantages. First, a good intuition inspires a better architecture. Welldefined interpretability may even promote the performance in the breakthrough level. Second, many areas need the cooperation between machines and humans, where interpretability is a necessary option. Last, interpretability could joint with handy work such as rules, which opens an industrial way for intelligence system. Thus, logic and probabilistic part, which could provide strong interpretability, are critical in intelligence graph.
On the other side, the methods from traditional algorithms and probabilistic graphs perform less satisfactory than neural networks, because of a weaker datafitting ability. Datafitting ability, which brings the research trend of deep learning, is also critical for intelligence systems. In summary, we shall jointly consider the datafitting ability and interpretability in the intelligence graph.
In my opinion, interpretable (i.e. probabilistic and logic
) parts should dominate the overview framework of iGraph, and neural parts are responsible for feature extraction and links between the interpretable parts. In this way, the entire architecture is interpretable with strong datafitting ability.
4 Experiment
This section has not been ready in this version.
5 Conclusion
In this paper, we propose a complete representation theory of intelligence, as intelligence graph (iGraph). Based on this novel paradigm, we design a graph for recommendation to tackle two issues, that the oversimple category distribution and diversity recommendation. Experimental results demonstrate the effectiveness and efficiency of our model.
References
 Cai et al. (2011) Cai, Deng, He, Xiaofei, Han, Jiawei, and Huang, Thomas S. Graph regularized nonnegative matrix factorization for data representation. Pattern Analysis and Machine Intelligence IEEE Transactions on, 33(8):1548–1560, 2011.
 Davenport et al. (2014) Davenport, Mark A., Plan, Yaniv, Berg, Ewout Van Den, and Wootters, Mary. 1bit matrix completion. Statistics, 3(3), 2014.
 Deng et al. (2014) Deng, Shuiguang, Huang, Longtao, and Xu, Guandong. Social networkbased service recommendation with trust enhancement. Expert Systems with Applications, 41(18):8075–8084, 2014.
 Ganti et al. (2015) Ganti, Ravi, Balzano, Laura, and Willett, Rebecca. Matrix completion under monotonic single index models. 2015.

Guo et al. (2015)
Guo, Meng Jiao, Sun, Jin Guang, and Meng, Xiang Fu.
A neighborhoodbased matrix factorization technique for
recommendation.
Annals of Data Science
, 2(3):1–16, 2015.  He et al. (2016) He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian. Deep residual learning for image recognition. In Computer Vision and Pattern Recognition, pp. 770–778, 2016.
 Huo et al. (2016) Huo, Zhouyuan, Liu, Ji, and Huang, Heng. Optimal discrete matrix completion. 2016.
 Jordan (2004) Jordan, Michael I. Graphical models. Statistical Science, 19(1):140–155, 2004.

KirÃ¡ly et al. (2012)
KirÃ¡ly, Franz J., Theran, Louis, and Tomioka, Ryota.
The algebraic combinatorial approach for lowrank matrix completion.
Journal of Machine Learning Research
, 62(2):299–321, 2012.  Ko et al. (2015) Ko, Han Gyu, Son, Joo Sik, and Ko, In Young. Multiaspect collaborative filtering based on linked data for personalized recommendation. In The International Conference on World Wide Web, pp. 49–50, 2015.
 Koller & Friedman (2009) Koller, Daphne and Friedman, Nir. Probabilistic Graphical Models: Principles and Techniques  Adaptive Computation and Machine Learning. MIT Press, 2009.
 Kschischang et al. (2001) Kschischang, Frank R, Frey, Brendan J, and Loeliger, Hans Andrea. Factor graphs and the sumproduct algorithm. IEEE Transactions on Information Theory, 47(2):498–519, 2001.
 Lee et al. (2016) Lee, J., Kim, S., Lebanon, G., Singer, Y., and Bengio, S. Llorma: Local lowrank matrix approximation. 2016.
 Liu et al. (2014) Liu, Jing, Li, Zechao, Tang, Jinhui, and Jiang, Yu. Personalized geospecific tag recommendation for photos on social websites. IEEE Transactions on Multimedia, 16(3):588–600, 2014.
 Liu et al. (2016) Liu, Yong, Zhao, Peilin, Liu, Xin, Wu, Min, and Li, Xiao Li. Learning optimal social dependency for recommendation. 2016.
 Ma (2013) Ma, Hao. An experimental study on implicit social recommendation. In International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 73–82, 2013.

Meka et al. (2009)
Meka, Raghu, Jain, Prateek, and Dhillon, Inderjit S.
Guaranteed rank minimization via singular value projection.
Nips, pp. 937–945, 2009.  Mnih & Salakhutdinov (2012) Mnih, A. and Salakhutdinov, R. Probabilistic matrix factorization. In International Conference on Machine Learning, pp. 880–887, 2012.
 Munos (2012) Munos, R. The optimistic principle applied to games, optimization and planning: Towards foundations of montecarlo tree search. Foundations and Trends in Machine Learning, 7(1):1–130, 2012.
 Murphy (2012) Murphy, Kevin P. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.
 Rahul Mazumder (2010) Rahul Mazumder, Trevor Hastie, Robert Tibshirani. Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11(11):2287–2322, 2010.
 Rennie & Srebro (2005) Rennie, Jasson D. M and Srebro, Nathan. Fast maximum margin matrix factorization for collaborative prediction. In International Conference, pp. 713–719, 2005.
 Ricci et al. (2011) Ricci, Francesco, Rokach, Lior, Shapira, Bracha, and Kantor, Paul B. Recommender systems handbook /. Springer,, 2011.
 Rumelhart et al. (1988) Rumelhart, D. E., Hinton, G. E., and Williams, R. J. Learning internal representations by error propagation. MIT Press, 1988.
 Song (2016) Song, Dogyoon. Blind regression : nonparametric regression for latent variable models via collaborative filtering. 2016.
 Vandereycken (2013) Vandereycken, Bart. Lowrank matrix completion by riemannian optimization. Siam Journal on Optimization, 23(23):10.1137/110845768, 2013.
 Wang et al. (2014a) Wang, Fei, Jiang, Meng, Zhu, Wenwu, Yang, Shiqiang, and Cui, Peng. Recommendation with social contextual information. IEEE Transactions on Knowledge and Data Engineering, 26(11):2789–2802, 2014a.
 Wang & Zhang (2013) Wang, Yu Xiong and Zhang, Yu Jin. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering, 25(6):1336–1353, 2013.
 Wang et al. (2014b) Wang, Z., Lai, M. J., Lu, Z., Fan, W., Davulcu, H., and Ye, J. Rankone matrix pursuit for matrix completion. pp. 91–99, 2014b.
 Xiao (2016) Xiao, Han. KSR: A semantic representation of knowledge graph within a novel unsupervised paradigm. arXiv preprint arXiv:1608.07685, 2016.
 Xiao (2017a) Xiao, Han. Hungarian layer: Logics empowered neural architecture. arXiv preprint arXiv:1712.02555, 2017a.
 Xiao (2017b) Xiao, Han. NDT: Neual decision tree towards fully functioned neural graph. arXiv preprint arXiv:1712.05934, 2017b.
 Xiao & Meng (2017) Xiao, Han and Meng, Lian. Sar: Semantic analysis for recommendation. 2017.
 Zhang et al. (2013) Zhang, Yongfeng, Zhang, Min, Liu, Yiqun, Ma, Shaoping, and Feng, Shi. Localized matrix factorization for recommendation based on matrix block diagonal forms. 2013.
 Zhou & Feng (2017) Zhou, Zhi Hua and Feng, Ji. Deep forest: Towards an alternative to deep neural networks. 2017.