Intelligence Graph

01/05/2018
by   Han Xiao, et al.
0

In fact, there exist three genres of intelligence architectures: logics (e.g. Random Forest, A^* Searching), neurons (e.g. CNN, LSTM) and probabilities (e.g. Naive Bayes, HMM), all of which are incompatible to each other. However, to construct powerful intelligence systems with various methods, we propose the intelligence graph (short as iGraph), which is composed by both of neural and probabilistic graph, under the framework of forward-backward propagation. By the paradigm of iGraph, we design a recommendation model with semantic principle. First, the probabilistic distributions of categories are generated from the embedding representations of users/items, in the manner of neurons. Second, the probabilistic graph infers the distributions of features, in the manner of probabilities. Last, for the recommendation diversity, we perform an expectation computation then conduct a logic judgment, in the manner of logics. Experimentally, we beat the state-of-the-art baselines and verify our conclusions.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

03/25/2022

ω-Forest Algebras and Temporal Logics

We use the algebraic framework for languages of infinite trees introduce...
03/10/2020

RNE: A Scalable Network Embedding for Billion-scale Recommendation

Nowadays designing a real recommendation system has been a critical prob...
07/22/2020

Graded Hoare Logic and its Categorical Semantics

Deductive verification techniques, based on program logics (i.e., the fa...
03/07/2000

A flexible framework for defeasible logics

Logics for knowledge representation suffer from over-specialization: whi...
11/01/2021

Improving Location Recommendation with Urban Knowledge Graph

Location recommendation is defined as to recommend locations (POIs) to u...
01/01/1998

Monotonicity and Persistence in Preferential Logics

An important characteristic of many logics for Artificial Intelligence i...

1 Introduction

In fact, there exist three genres of intelligence architectures: logics (e.g. Random Forest (Zhou & Feng, 2017), A Searching (Munos, 2012)), neurons (e.g. CNN (He et al., 2016), LSTM(Xiao, 2017a)) and probabilities (e.g. Naive Bayes (Koller & Friedman, 2009), HMM (Murphy, 2012)). The proposed neural graph (Xiao, 2017b) unifies the methodology of logics and neurons, which provides a more powerful form for intelligence systems. However, from the perspective of uncertainty, neural graph only characterizes the system in a deterministic way. In order to model intelligence in both deterministic and stochastic manner, we shall unify the neural and probabilistic graph in this paper as intelligence graph, or iGraph. From the perspective of practice, the system designer could freely employ the models from both neural and probabilistic graph (Xiao, 2017b; Jordan, 2004) in one architecture at the same time.

Specifically, the neural and probabilistic parts corporate in the framework of forward-backward propagation, (Rumelhart et al., 1988; Xiao, 2017a). In iGraph, we indicate the interface between neurons and probabilistic variables, where input interface links neurons to stochastic parts and output interface plays the otherwise corresponding role. As shown in Figure 1, we leverage the topic model (Murphy, 2012)

(notice the difference between pLSI) to classify documents, where the topic distribution of words

stem from the hidden representations of LSTM, rather than the directly learned distribution vectors. Then, the soft-max layer is employed for the document classification with the corresponding topic distribution

. Mathematically, we could derive the system as below:

(1)
(2)

where is the parameter of softmax and is the conventional network of LSTM.

Figure 1: The illustration of the interface between neurons and probabilities for the task of text classification. To start, the sentence is parsed by LSTM to generate the context-based word representations, which are leveraged as in next probabilistic graph to calculate

. Last, a softmax layer is employed to classify the documents with the corresponding topic distributions.

In essence, the gradients could be automatically derived, only if we could formulate the probabilistic part. Simply, in the forward pass, we treat the input/output interface as input/output probabilistic distribution, while the sum-product algorithm formulates this process for calculation, (Kschischang et al., 2001). For the example of Figure 1, the input/output distribution is /. To start, we work out the joint probabilistic distribution as , and then the formulation of sum-product algorithm is presented in Equation (1). In the backward pass, the gradient propagates though Equation (1) to LSTM, in a conventional manner. Notably, given the specific iGraph, all of the deductions could be performed automatically.

Figure 2: The iGraph of Our Model. First, the probabilistic distributions of categories are generated from the embedding representations of users/items, in the manner of neurons. Second, the probabilistic graph infers the distributions of features, in the manner of probabilities. Last, for the diversity of recommendation, we perform an expectation computation then conduct a logic judgment, in the manner of logics.

With this principle of iGraph, we tackle the task of recommendation in Figure 2, as an extension of SAR (Xiao & Meng, 2017). SAR is a probabilistic model, which applies semantic principle (Xiao, 2016), to represent users/items semantically, and then treats the recommendation as a process of semantic matching. There exist two disadvantages for SAR, listed as (1.) The characterization of category distribution that a distribution vector is too simplified to achieve better performance. (2.) There is no mechanism for considering recommendation diversity.

Regarding the first disadvantage, we employ a neural network to generate the category distributions from user and item embedding representations. In this way, the category distributions are modeled precisely. Regarding the second disadvantage, we respectively process the situations where the predicted rating is high or medium. Actually, a war-movie fan would not expect all the recommended films are war-related, because

some high-quality movies would not make a perfect semantic match, but still could be highly rated. In the final stage of our iGraph, for the diversity of recommendation, we check out the case of high-quality movies and make an appropriate recommendation based on the quality and popularity.

We conduct the experiments of rating prediction on Movielens dataset to verify our model. Experimental results demonstrate the effectiveness and efficiency of our model, because our model beats all the state-of-the-art baselines. We also vary the hyper-parameters to testify our model, and conclude that our model is robust for hyper-parameter settings.

Contributions. We list two contributions: (1.) Intelligence graph (iGraph) could represent all the combinations and iterations of almost every intelligence method, which incurs a complete representation theory of intelligence. (2.) We design a novel graph for recommendation based on SAR and tackle two issues that the oversimplified category distribution and recommendation diversity. Besides, we achieve the state-of-the-art performance in the task of rating prediction.

2 Related Work

We have surveyed the relevant papers and roughly classified the existing recommendation methods into five categories: matrix factorization, neighborhood based method, regression based method, social information integration method and semantic analysis. Notably, most methods are based on rating matrix completion, which is a conventional setting for recommendation, (Xiao & Meng, 2017).

Matrix Factorization is a classic recommendation methodology. First, this paradigm factorizes the rating matrix to get the user/item-specific latent matrices ,. Then, the method multiplies

to estimate the unobservable ratings, where

is the complete rating matrix. Because this branch hypothesizes different assumptions on latent matrices and , there also exist four primary subcategories according to the applied hypotheses. (1.) Basic matrix factorization generally constrains latent factors as positive/non-negative, such as NMF (Wang & Zhang, 2013), SVP (Meka et al., 2009), MMMF (Rennie & Srebro, 2005), PMF (Mnih & Salakhutdinov, 2012). (2.) Matrix factorization jointed with neighborhood-based modeling, such as CISMF (Guo et al., 2015). (3.) Matrix factorization under various rank assumptions explores the effects of representation and generalization ability for the matrix rank to boost the performance, such as LLORMA (Lee et al., 2016)(Ko et al., 2015), R1MP (Wang et al., 2014b), SoftImpute (Rahul Mazumder, 2010), (Ganti et al., 2015), (Zhang et al., 2013), (Király et al., 2012). (4.) Matrix factorization with discrete outputs treats each item of the rating matrix as discrete value to avoid noise and obtains more interpretations, such as ODMC (Huo et al., 2016).

Neighborhood Based Method is one of the most seminal approaches, assuming that the similar items/users trigger similar rating preference. There exist three main variants including item-based, user-based and global similarity, surveyed in (Guo et al., 2015) and (Ricci et al., 2011).

Regression Based Method formulates recommendation or matrix completion as a regression problem, such as graph regression method GRALS (Cai et al., 2011), blind regression (Song, 2016), Riemannian manifold based regression (Vandereycken, 2013), and others (Davenport et al., 2014).

Social Information Integration applies social information to strengthen the recommendation such as relationship between users, personalized profiles or movies’ attributes. There list some latest researches: SR (Ma, 2013), PRMF (Liu et al., 2016), geo-specific personalization (Liu et al., 2014), social network based methods (Deng et al., 2014) and other social context integration methods (Wang et al., 2014a).

Semantic Analysis takes the advantage of semantic principle or multi-view clustering methodology for recommendation (Xiao & Meng, 2017). Semantic principle conjectures that clusters and semantics are equivalently corresponded, (Xiao, 2016). Simply, SAR (Xiao & Meng, 2017) clusters the users/items in different views, where each view corresponds to a specific semantic style. Then, summarizing the cluster information in each semantic view, we obtain the user/item-specific semantic representation. Last, SAR performs a process of semantic matching to discriminate the highly rated items for users (better match triggers higher rating). This method is based on probabilistic graph, thus lacking of non-linear function and complex neural network structures leads to unsatisfactory performance. Also, there exist the recommendation diversity issue as previously discussed.

3 Methodology

Our iGraph is illustrated in Figure 2. In this paper, we design a recommendation model with semantic principle. First, the probabilistic distributions of categories are generated from the embedding representations of users/items, in the manner of neurons. Second, the probabilistic graph infers the distributions of features, in the manner of probabilities. Last, for the recommendation diversity, we perform an expectation computation then conduct a logic judgment, in the manner of logics.

3.1 Architecture

In the sequel, we discuss our graph in corresponding five components: embedding layer, user/item-specific network, semantic component, rating generation and logic judgment.

Embedding Layer. It is necessary to represent users/items in a latent manner, because the input of recommendation is oversimple. Actually, most recommendation methods take the idea of embedding. For the example of PMF, the row/column of factor matrices represent the user/item in a manner of probabilistic distribution. Specifically, regarding SAR, the category distribution of / corresponds to the functionality of embedding representation. In this paper, we take the -dimensional real vector as our embedding, and then we concatenate the corresponding user and item embedding vector as the embedding of this entry .

User/Item-Specific Network. SAR simply supposes the user/item category distribution is only related to the corresponding user/item. However, we conjecture user category distribution should vary slightly with the different items, and similar case for item. By the flexibility provided by iGraph, we employ two specific networks to transform the embedding of rating matrix entry into the corresponding category distributions, which is the input interface of probabilistic part. Notably, we customize the network as multiple layer perceptions in the hyper-parameter setting.

(3)

Semantic Component. This component is a direct copy of SAR, which is a two-level hierarchical generation process. Simply, the model generates different features in the first-level process, while the user/item generates the categories for each feature in the second-level process, correspondingly. Finally, the categories in each feature generate the preference , where is the range of rating. There introduce three factors: , and , two of which are described in the previous paragraph. Regarding the remaining distribution , we should consider the the effect of semantic matching as:

(6)

where is tabular model parameters which can be tuned in the learning process and is the hyper-parameter.

Though the deduction of probabilistic part is automatically performed by sum-product rule, we still present the corresponding probabilistic form for clarity, where two-level mixture indicates the process of multi-view clustering methodology, according to semantic principle.

Rating Generation. Regarding the task of prediction, according to SAR, rating is estimated as the expectation of soft-max distribution, as formulated in (7).

(7)

where corresponds to the rating range, and is the output interface as , which has entries.

0:  Predicted ratings , hyper-parameter , feature distribution and another item embedding .
0:  Final predicted rating .
1:  
2:  if  then
3:     
4:  end if
5:  return
Algorithm 1 Diversity Recommendation.

Logic Judgment. Argued in the “Introduction”, we should consider the high-quality movies for recommendation diversity. It is necessary to judge the match degree, because we only perform diversity recommendation for a special range of predicted rating, such as . In fact, the highly rated items need no extra processing, thus we limit the upper bound as , while extremely semantically unmatched items should not be considered as recommended, even if it is popular or high-quality, hence we limit the lower bound as . Notice that are two distinguishable hyper-parameters.

Generally, we complement the rating in the range with a neural network as , where is the output of the corresponding network and the input is another item embedding and feature distribution . To summarize, we present this process in Algorithm 1. Notice that the network is customized as a multiple layer perception in the hyper-parameter setting.

3.2 From the Perspective of Interpretability

Commonly agreed by our community, neural networks or neural parts are black-box. The critical point of neural style is strong data-fitting ability, while the flaw is lacking of interpretability.

However, a better interpretability takes at least three advantages. First, a good intuition inspires a better architecture. Well-defined interpretability may even promote the performance in the breakthrough level. Second, many areas need the cooperation between machines and humans, where interpretability is a necessary option. Last, interpretability could joint with handy work such as rules, which opens an industrial way for intelligence system. Thus, logic and probabilistic part, which could provide strong interpretability, are critical in intelligence graph.

On the other side, the methods from traditional algorithms and probabilistic graphs perform less satisfactory than neural networks, because of a weaker data-fitting ability. Data-fitting ability, which brings the research trend of deep learning, is also critical for intelligence systems. In summary, we shall jointly consider the data-fitting ability and interpretability in the intelligence graph.

In my opinion, interpretable (i.e. probabilistic and logic

) parts should dominate the overview framework of iGraph, and neural parts are responsible for feature extraction and links between the interpretable parts. In this way, the entire architecture is interpretable with strong data-fitting ability.

4 Experiment

This section has not been ready in this version.

5 Conclusion

In this paper, we propose a complete representation theory of intelligence, as intelligence graph (iGraph). Based on this novel paradigm, we design a graph for recommendation to tackle two issues, that the oversimple category distribution and diversity recommendation. Experimental results demonstrate the effectiveness and efficiency of our model.

References

  • Cai et al. (2011) Cai, Deng, He, Xiaofei, Han, Jiawei, and Huang, Thomas S. Graph regularized nonnegative matrix factorization for data representation. Pattern Analysis and Machine Intelligence IEEE Transactions on, 33(8):1548–1560, 2011.
  • Davenport et al. (2014) Davenport, Mark A., Plan, Yaniv, Berg, Ewout Van Den, and Wootters, Mary. 1-bit matrix completion. Statistics, 3(3), 2014.
  • Deng et al. (2014) Deng, Shuiguang, Huang, Longtao, and Xu, Guandong. Social network-based service recommendation with trust enhancement. Expert Systems with Applications, 41(18):8075–8084, 2014.
  • Ganti et al. (2015) Ganti, Ravi, Balzano, Laura, and Willett, Rebecca. Matrix completion under monotonic single index models. 2015.
  • Guo et al. (2015) Guo, Meng Jiao, Sun, Jin Guang, and Meng, Xiang Fu. A neighborhood-based matrix factorization technique for recommendation.

    Annals of Data Science

    , 2(3):1–16, 2015.
  • He et al. (2016) He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian. Deep residual learning for image recognition. In Computer Vision and Pattern Recognition, pp. 770–778, 2016.
  • Huo et al. (2016) Huo, Zhouyuan, Liu, Ji, and Huang, Heng. Optimal discrete matrix completion. 2016.
  • Jordan (2004) Jordan, Michael I. Graphical models. Statistical Science, 19(1):140–155, 2004.
  • Király et al. (2012) Király, Franz J., Theran, Louis, and Tomioka, Ryota. The algebraic combinatorial approach for low-rank matrix completion.

    Journal of Machine Learning Research

    , 62(2):299–321, 2012.
  • Ko et al. (2015) Ko, Han Gyu, Son, Joo Sik, and Ko, In Young. Multi-aspect collaborative filtering based on linked data for personalized recommendation. In The International Conference on World Wide Web, pp. 49–50, 2015.
  • Koller & Friedman (2009) Koller, Daphne and Friedman, Nir. Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning. MIT Press, 2009.
  • Kschischang et al. (2001) Kschischang, Frank R, Frey, Brendan J, and Loeliger, Hans Andrea. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2):498–519, 2001.
  • Lee et al. (2016) Lee, J., Kim, S., Lebanon, G., Singer, Y., and Bengio, S. Llorma: Local low-rank matrix approximation. 2016.
  • Liu et al. (2014) Liu, Jing, Li, Zechao, Tang, Jinhui, and Jiang, Yu. Personalized geo-specific tag recommendation for photos on social websites. IEEE Transactions on Multimedia, 16(3):588–600, 2014.
  • Liu et al. (2016) Liu, Yong, Zhao, Peilin, Liu, Xin, Wu, Min, and Li, Xiao Li. Learning optimal social dependency for recommendation. 2016.
  • Ma (2013) Ma, Hao. An experimental study on implicit social recommendation. In International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 73–82, 2013.
  • Meka et al. (2009) Meka, Raghu, Jain, Prateek, and Dhillon, Inderjit S.

    Guaranteed rank minimization via singular value projection.

    Nips, pp. 937–945, 2009.
  • Mnih & Salakhutdinov (2012) Mnih, A. and Salakhutdinov, R. Probabilistic matrix factorization. In International Conference on Machine Learning, pp. 880–887, 2012.
  • Munos (2012) Munos, R. The optimistic principle applied to games, optimization and planning: Towards foundations of monte-carlo tree search. Foundations and Trends in Machine Learning, 7(1):1–130, 2012.
  • Murphy (2012) Murphy, Kevin P. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.
  • Rahul Mazumder (2010) Rahul Mazumder, Trevor Hastie, Robert Tibshirani. Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11(11):2287–2322, 2010.
  • Rennie & Srebro (2005) Rennie, Jasson D. M and Srebro, Nathan. Fast maximum margin matrix factorization for collaborative prediction. In International Conference, pp. 713–719, 2005.
  • Ricci et al. (2011) Ricci, Francesco, Rokach, Lior, Shapira, Bracha, and Kantor, Paul B. Recommender systems handbook /. Springer,, 2011.
  • Rumelhart et al. (1988) Rumelhart, D. E., Hinton, G. E., and Williams, R. J. Learning internal representations by error propagation. MIT Press, 1988.
  • Song (2016) Song, Dogyoon. Blind regression : nonparametric regression for latent variable models via collaborative filtering. 2016.
  • Vandereycken (2013) Vandereycken, Bart. Low-rank matrix completion by riemannian optimization. Siam Journal on Optimization, 23(23):10.1137/110845768, 2013.
  • Wang et al. (2014a) Wang, Fei, Jiang, Meng, Zhu, Wenwu, Yang, Shiqiang, and Cui, Peng. Recommendation with social contextual information. IEEE Transactions on Knowledge and Data Engineering, 26(11):2789–2802, 2014a.
  • Wang & Zhang (2013) Wang, Yu Xiong and Zhang, Yu Jin. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering, 25(6):1336–1353, 2013.
  • Wang et al. (2014b) Wang, Z., Lai, M. J., Lu, Z., Fan, W., Davulcu, H., and Ye, J. Rank-one matrix pursuit for matrix completion. pp. 91–99, 2014b.
  • Xiao (2016) Xiao, Han. KSR: A semantic representation of knowledge graph within a novel unsupervised paradigm. arXiv preprint arXiv:1608.07685, 2016.
  • Xiao (2017a) Xiao, Han. Hungarian layer: Logics empowered neural architecture. arXiv preprint arXiv:1712.02555, 2017a.
  • Xiao (2017b) Xiao, Han. NDT: Neual decision tree towards fully functioned neural graph. arXiv preprint arXiv:1712.05934, 2017b.
  • Xiao & Meng (2017) Xiao, Han and Meng, Lian. Sar: Semantic analysis for recommendation. 2017.
  • Zhang et al. (2013) Zhang, Yongfeng, Zhang, Min, Liu, Yiqun, Ma, Shaoping, and Feng, Shi. Localized matrix factorization for recommendation based on matrix block diagonal forms. 2013.
  • Zhou & Feng (2017) Zhou, Zhi Hua and Feng, Ji. Deep forest: Towards an alternative to deep neural networks. 2017.