Inferring User Preferences by Probabilistic Logical Reasoning over Social Networks

11/11/2014 ∙ by Jiwei Li, et al. ∙ Stanford University The Ohio State University 0

We propose a framework for inferring the latent attitudes or preferences of users by performing probabilistic first-order logical reasoning over the social network graph. Our method answers questions about Twitter users like Does this user like sushi? or Is this user a New York Knicks fan? by building a probabilistic model that reasons over user attributes (the user's location or gender) and the social network (the user's friends and spouse), via inferences like homophily (I am more likely to like sushi if spouse or friends like sushi, I am more likely to like the Knicks if I live in New York). The algorithm uses distant supervision, semi-supervised data harvesting and vector space models to extract user attributes (e.g. spouse, education, location) and preferences (likes and dislikes) from text. The extracted propositions are then fed into a probabilistic reasoner (we investigate both Markov Logic and Probabilistic Soft Logic). Our experiments show that probabilistic logical reasoning significantly improves the performance on attribute and relation extraction, and also achieves an F-score of 0.791 at predicting a users likes or dislikes, significantly better than two strong baselines.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Extracting the latent attitudes or preferences of users on the web is an important goal, both for practical applications like product recommendation, targeted online advertising, friend recommendation, or for helping social scientists and political analysts gain insights into public opinion and user behavior.

Evidence for latent preferences can come both from attributes of a user or from preferences of other people in their social network. For example people from Illinois may be more likely to like the Chicago bears, while people whose friends like sushi may be more likely to like sushi.

A popular approach to draw on such knowledge to help extract user preferences is to make use of collaborative filtering, typically applied on structured data describing explicitly provided user preferences (e.g. movie ratings), and often enriched by information from a social network [18, 25, 35, 38, 52]. These methods can thus combine information from shared preferences and attributes with information about social relations.

In many domains, however, these user preferences and user attributes are not explictly provided, and we may not even have explicit knowledge of relations in the social network. In such cases, we may first need to estimate these latent attributes or preferences, resulting in only probabilistic estimates of each of these sources of knowledge. How can we reason about user preferences given only weak probabilistic sources of knowledge about users’ attributes, preferences, and social ties? This problem occurs in domains like Twitter, where knowledge about users’ attitudes, attributes, and social relations must be inferred.

We propose to infer user preferences on domains like Twitter without explicit information by applying relational reasoning frameworks like Markov Logic Networks (MLN) [71] and Probabilistic Soft Logic (PSL) [26] to help infer these relational rules. Such probabilistic logical systems are able to combine evidence probabilistically to draw logical inference.

For example, such systems could learn individual probabilistic inference rules expressing facts like “people who work in IT companies like electronic devices" with an associated probability (in this case 0.242):

Work-In-IT-company(A) like-electronic-device (0.242)

Such systems are also able to perform global inference over an entire network of such rules, combining probabilistic information about user attributes, preference, and relations to predict preferences of other users.

Our algorithm has two stages. In the first stage we extract user attributes. Unlike structured knowledge bases such as Freebase and Wikipedia, propositional knowledge describing the attributes of entities in social networks is very sparse. Although some social media websites (such as Facebook or LinkedIn) do support structured data format of personal attributes, these attributes may still be sparse, since only a small proportion of users fill out any particular profile fact, and many sites (such as Twitter) do not provide them at all. On the other hand, users of online social media frequently publish messages describing their preferences and activities, often explicitly mentioning attributes such as their Job, Religion, or Education [48]. We propose a text extraction system for Twitter that combines supervision [15], semi-supervised data harvesting (e.g., [40, 41]) and vector space models [5, 55] to automatically extract structured profiles from the text of users’ messages. Based on this approach, we are able to construct a comprehensive list of personal attributes which are explicitly mentioned in text (e.g., Like/Dislike, Live-in, Work-for) and user relations (e.g., Friend,Couple).

While coverage of user profile information can dramatically be increased by extracting information from text, not all users explicitly mention all of their attributes. To address this, in the second stage of our work we further investigate whether it is possible to extend the coverage of extracted user profiles by inferring attributes not explicitly mentioned in text through logical inference.

Finally, we feed the extracted attributes and relations into relational reasoning frameworks, including Markov Logic Networks (MLN) [71] and Probabilistic Soft Logic (PSL) [26], to infer the relational rules among users, attributes and user relations that allow us to predict user preferences.

We evaluate the system on a range of prediction tasks including preference prediction (liking or disliking) but also attributes like location or relations like friend-of.

The system described in this paper provides new perspectives for understanding, predicting interests, tendencies and behaviors of social media users in everyday life. While our experiments are limited to one dataset, Twitter, the techniques are general and can be easily adapted with minor adjustments. The major contributions of this paper can be summarized as follows:

  • We present an attempt to perform probabilistic logical reasoning on social networks.

  • Our framework estimates the attributes of online social media users without requiring explicit mentions.

  • Our framework combines probabilistic information about user attributes, preferences, and relations to predict latent relations and preferences.

  • We present a large-scale user dataset specific for this task.

The next sections show how user attributes, relations, and preferences are extracted from text and introduce the probabilistic logical frameworks. Our algorithm and results are illustrated in Section 4 and 5.

2 Extracting Probabilistic Logical Predicates

Given the message streams from Twitter users, our first task is to extract information about user attributes, relations, and preferences in logical form; these will then be input to our global logical inference network.

We represent these facts by two kinds of propositional logic objects: predicates and functions. Functions represent mappings from an object to another object, returning an object, as in CapitalOf(France)=Paris. Predicates represent whether a relation holds among two objects and return a boolean value. For example, if usrA and usrB are friends on Twitter, the predicate IsFriend (usrA,usrB)= true. Predicates and functions can be transformed to each other. Given the function WifeOf(usrA)=usrB, the predicate IsCouple(usrA,usrB)=true will naturally hold. As we will demonstrate later, all functions will be transformed to predicates in graph construction procedure.

2.1 Dataset

We use a random sample of Twitter users—after discarding users with less than 10 tweets– consisting of 0.5 million Twitter users. We crawled their published tweets and their network using the Twitter API111Due to API limitations, we can crawl at most 2,000 tweets for each user., resulting in a dataset of roughly 75 million tweets.

2.2 User Attributes

In the next sections we first briefly describe how we extract predicates for user attributes (location, education, gender) and user relations (friend, spouse), and then focus in detail on the extraction of user preferences (like/dislike).

2.2.1 Location

Our goal is to associate one of the 50 states of the United States with each user. While there has been a significant amount of work on inferring the location of a given published tweet (e.g., [10, 17, 78]), there is less focus on user-level inference. In this paper, we employ a rule-based approach for user-location identification. We selected out all geo-tagged tweets from a specific user, and say an entity corresponds to the location of current user if it satisfies the following criteria, designed to ensure high-precision (although with a natural corresponding drop in recall):

  1. user published more than 10 tweets from location

  2. user published from location in at least three different months of a year.

We only consider locations within the United States and entities are matched to state names via Google Geocoding. In the end, we are able to extract locations for of the users from our dataset.

2.2.2 Education/Job

Job and education attributes are extracted by combining a rule based approach with an existing probabilistic system described in [48].

First, for each user, we obtained his or her full name and fed it into a Google+ API222https://developers.google.com/+/api/. Many Google+ profiles are publicly accessible and many users explicitly list attributes such as their education and job. The major challenge involved here is name disambiguation, to match users’ Twitter accounts to Google+ accounts.333A small property of Google+ accounts contain direct Twitter links. In those cases, accounts can be directly matched. We adopted the strategy taken in [48] that if more than 10 percent of and at least 20 friends are shared by Google+ circles and Twitter followers, we assume that the two accounts point to the same person. percent of users’ job or education attributes are finalized based on their Google+ accounts.

For cases where user names can not be disambiguated or no relevant information is obtained from Google+, we turn to the system provided by Li et al. [48] (for details about algorithms in [48], see Section 6). This system extracts education or job entities from the published Twitter content of the user, For each Twitter user, the system returns the education or job entity mentioned in the users Tweets, associated with a corresponding probability, for example,

Since the Li et al. system system requires the user to explicitly mention their education or job entities in their published content, it is again low-recall: another percent of users’ job or education attributes are inferred from the system.

2.2.3 Gender

Many frameworks have been devoted to gender prediction from Twitter posts (e.g., [9, 12, 66, 84]) studying whether high level tweet features (e.g., link, mention, hashtag frequency) can help in the absence of highly-predictive user name information. Since our goal is not guessing the gender without names but rather studying the extent to which global probabilistic logical inference over the social network can improve the accuracy of local predictors, we implement a high-precision rule based approach that uses the national Social Security Gender Database (SSGD)444http://www.ssa.gov/oact/babynames/names.zip. SSGD contains first-name records annotated for gender for every US birth since 1880 A.D. Many names are highly gender-specific, while others are ambiguous. We assign a user a gender if his/her first name appears in the dataset for one gender at least 20 times as often as for the other. Using this rule we assign gender to of the users in our dataset.

2.3 User Relations

The user-user relations we consider in this work include friend (usrA, usrB), spouse (usrA, usrB) and LiveInSamePlace (usrA, usrB).

Friend: Twitter supports two types of following patterns, following and followed. We consider two people as friends if they both follow each other (i.e. bidirectional following). Thus if relation Friend(usrA,usrB) holds, usrA has to be both following and followed by usrB. The friend relation is extracted straightforwardly from the Twitter network.

Spouse/Boyfriend/Girlfriend: For the spouse relation, we again turn to Li et al.’s system [48]. For any two given Twitter users and their published contents, the system returns a score in the range of [0,1] indicating how likely Spouse(usr1,usr2) relation is to hold. We use a threshold of 0.5 and then for any pair of users with a higher score than 0.5, we use a continuous variable to denote the confidence, the value of which is computed by linearly projecting into [0,1] space.

LiveInSamePlace: Straightforwardly inferred from the location extraction approach described in 2.2.1.

2.4 User Preferences: Like and Dislike

We now turn to user preferences and attitudes—a central focus of our work—and specifically the predicates like(usr,entity) and dislike(usr,entity)

. Like the large literature on sentiment analysis from social media (e.g.,

[1, 39, 65, 79]). our goal is to extract sentiment, but in addition to extract the target or object of the sentiment. Our work thus resembles other work on sentiment target extraction ([11, 36, 91]

) using supervised classifiers or sequence models based on manually-labeled datasets. Unfortunately, manually collecting training data in this task is problematic because (1) tweets talking about what the user

likes/dislikes are very sparsely distributed among the massive number of topics people discuss on Twitter and (2) tweets expressing what the user likes exist in a great variety of scenarios and forms.

To deal with data sparsity issues, we collect training data by combining semi-supervised information harvesting techniques [16, 40, 41, 47] and the concept of distant supervision [15, 24, 57] as follows:

Semi-supervised information harvesting: We applied the standard seed-based information-extraction method of obtaining training data recursively by using seed examples to extract patterns, which are used to harvest new examples, which are further used as new seeds to train new patterns. We begin with pattern seeds including “I like/love/enjoy (entity)", “I hate/dislike (entity)", “(I think) (entity) is good/ terrific/ cool/ awesome/ fantastic", “(I think) (entity) is bad/terrible/awful suck/sucks". Entities extracted here should be nouns, which is determined by a Twitter-tuned POS package [64].

Based on the harvested examples from each iteration, we train 3 machine learning classifiers:

  • A tweet-level SVM classifier (tweet-model 1) to distinguish between tweets that intend to express like/dislike properties and tweets for all other purposes.

  • A tweet-level SVM classifier (tweet-model 2) to distinguish between like and dislike555We also investigated a 3-class classifier for like, dislike and not-related, but found the performance constantly underperforms using separate classifiers..

  • A token-level CRF sequence model (entity-model) to identify entities that are the target of the users like/dislike.

The SVM classifiers are trained using the SVM package [34] with the following features:

  • Unigram, bigram features with corresponding part-of-speech tags and NER labels.

  • Dictionary-derived features based on a subjectivity lexicon

    [90].

The CRF model [43] is trained using the CRF++ package666https://code.google.com/p/crfpp/ based on the following features:

  • Current word, context words within a window of 3 words and their part-of-speech tags.

  • Name entity tags and corresponding POS tags.

  • Capitalization and word shape.

Trained models are used to harvest more examples, which are further used to train updated models. We do this iteratively until the stopping condition is satisfied.

Distant Supervision: The main idea of distant supervision is to obtain labeled data by drawing on some external sort of evidence. The evidence may come from a database777For example, if datasets says relation IsCapital holds between Britain and London, then all sentences with mention of “Britain" and “London" are treated as expressing IsCapital relation [57, 75]. or common-sense knowledge888Tweets with happy emoticons such as :-) : ) are of positive sentiment [24].. In this work, we assume that if a relation Like(usr, entity) holds for a specific user, then many of their published tweets mentioning the entity also express the Like relationship and are therefore treated as positive training data. Since semi-supervised approaches heavily rely on seed quality [41] and the patterns derived by the recursive framework may be strongly influenced by the starting seeds, adding in examples from distant supervision helps increase the diversity of positive training examples.

An overview of our algorithm showing how the semi-supervised approach is combined with distant supervision is illustrated in Figure 1.

  Begin
Train tweet classification model (SVM) and entity labeling model (CRF) based on positive/negative data harvested from starting seeds.
While stopping condition not satisfied:

  1. [topsep=0pt, partopsep=0pt]

  2. Run classification model and labeling model on raw tweets. Add newly harvested positive tweets and entities to the positive dataset.

  3. For any user and entity , if relation like(usr,entity) holds, add all posts published by mentioning to positive training data.

End
 

Figure 1: Algorithm for training data harvesting for extraction user like/dislike preferences.

Stopping Condition: To decide the optimum number of steps for the algorithm to stop, we manually labeled a dataset which contains 200 positive tweets (100 like and 100 dislike) with entities., selected from the original raw tweet dataset rather than the automatically harvested data. The positive dataset is matched with 800 negative tweets. For each iteration of data harvesting, we evaluate the performance of the classification models and labeling model on this human-labeled dataset, which can be viewed as a development set for parameter tuning. Results are reported in Table 1. As can be seen, the precision score decreases as the algorithm iterates, but the recall rises. The best F1 score is obtained at the end of third round of iteration.

Pre Rec F1
iteration 1 tweet-model 1 0.86 0.40 0.55
tweet-model 2 0.87 0.84 0.85
entity label 0.83 0.40 0.54
iteration 2 tweet-model 1 0.78 0.57 0.66
tweet-model 2 0.83 0.86 0.84
entity label 0.79 0.60 0.68
iteration 3 tweet-model 1 0.76 0.72 0.74
tweet-model 2 0.87 0.86 0.86
entity label 0.77 0.72 0.74
iteration 4 tweet-model 1 0.72 0.74 0.73
tweet-model 2 0.82 0.82 0.82
entity label 0.74 0.70 0.72
Table 1: Performance on the manually-labeled devset at different iterations of data harvesting.

For evaluation purposes, data harvesting without distant supervision (no-distant) naturally constitutes a baseline. Another baseline we employ is to train a one-step CRF model which directly decide whether a specific token corresponds to a like/dislike entity rather than making tweet-level decision first. Both (no-distant) and one-step-crf

rely on our recursive framework and tune the number of iterations on the aforementioned gold standards. Our testing dataset is comprised of additional 100 like/dislike property related tweets (50 like and 50 dislike) with entity labels, which are then matched with 400 negative tweets. The last baseline we employ is the rule based extraction approach by using the seed patterns. We report the best performance model on the end-to-end entity extraction precision and recall. To note, end-to-end evaluation setting here is different from what it is in Table 1, as if tweet level models make erroneous decision, labels assigned at entity level would be treated as wrong,

Model Pre Rec F1
semi+distant 0.73 0.64 0.682
no-distant 0.70 0.65 0.674
one-step (CRF) 0.67 0.64 0.655
rule 0.80 0.30 0.436
Table 2: Performances of different models on extraction of user preferences (like/dislike) toward entities.

As can be seen from Table 2, about three points of performance boost are obtained by incorporating user-entity information from distant supervision. Modeling tweet-level and entity-level information separately yields better performance than incorporating them in a unified model (one-step-crf).

We apply the model trained in this subsection to our tweet corpora. We filter out entities that appear less than 20 times, resulting in roughly 40,000 different entities999Consecutive entities with same type of NER labels are merged. in the dataset.

Entity Clustering: We further cluster the extracted entities into different groups, with an goal of answering questions like ‘if usr1 likes films, how likely would she like the film Titanic?’

Towards this goal, we train a skip-gram neural language model [55, 56] based on the tweet dataset using word2vec where each word is represented as a real-valued, low-dimensional vector101010Word embedding dimension is set to 200

. Skip-gram language models draw on local context in order to learn similar embeddings for semantically similar words. Next we run a k-means clustering algorithm (k=20) on the extracted entities, using L2 distance. From the learned clusters, we manually selected out 12 sensible ones, including food, sports, TV and movies, politics, electronic products, albums/concerts/songs, travels, books, fashions, financial stuff, and pets/animals. Each of the identified clusters is then matched with a human label.

Like Attribute Extracted from Network:

We extract more like/dislike preferences by using the following network of Twitter. If a twitter user is followed by current user , but not bidirectionally, and that contains more than 100,000 followers, we treat as a public figure/celebrity that is liked by current user .

3 Logic Networks

In this section, we describe MLN and PSL, which have been widely applied in relational learning and logic reasoning.

3.1 Markov Logic

Markov Logic [71] is a probabilistic logic framework which encodes weighted first-order logic formulas in a Markov network. By translating to logic, the expression people from Illinois like the NFL football team Chicago Bears can be expressed as:

(1)

Real world predicates are first converted to symbols using logical connectives and quantifiers. In MLN, each of the predicates (e.g., LiveIn and like) corresponds to a node and each formula is associated with a weighted value . The frameworks optimizes the following probability:

(2)

where . denotes the normalization factor and denotes the states of nodes in the network. In our early example, could take the following 4 values, i.e., , , and . is the number of true groundings for state . Consider the simple logic network shown in Equ. 1 with weight , given the logic rule that is true iff is false or is true, we have and the probability of each of the other three .

For inference, the probability of predicate given the rest of the predicates is written as:

(3)

Many approaches have been proposed for fast and effective learning for MLNs [53, 63, 82]. In this work, we use the discriminative training approach [82], as will be demonstrated in Section 4.1.

3.2 Probabilistic Soft Logic

PSL [4, 37] is another sort of logic reasoning architecture. It first associates each predicate with a soft truth value . Based on such soft truth values, PSL performs logical conjunction and disjunction in the following ways:

(4)

Next a given formula is said to be satisfied if . PSL defines a variable , the ‘distance to satisfaction’, to capture how far rule r is from being true. is given by . For example, if

(5)

and

(6)

then

(7)

PSL is optimized through maximizing observed rules in terms of distant :

(8)

where denotes the normalization factor, and denotes the weight for formula. Inference can be straightforwardly performed by calculating the distance

between the predicates. Compared with MLN, the PSL framework can be efficiently optimized based on a linear program. Another key distinguishing feature for PSL is that it uses continuous variables (soft truth values) rather than binary ones in MLN.

4 Logic Reasoning on Social Networks

Based on our extraction algorithm in Section 2, each user , is associated with a list of attributes and preferences, and is related by various relations to other users in a network. Function symbols are transformed to predicates for graph construction, where all the nodes in the graph take on binary values (i.e., true or false).

4.1 Assumptions and Simplifications

As existing algorithms might be difficult to scale up to the size of users and attributes we consider, we make some assumptions to enable faster learning and inference:

Cut off Edges: If relations like (usrA, entity1), like (usrB, entity2) and friend (usrA, usrB) hold, but entitiy1 and entity2 are from different like-entity categories, we would say like (usr1, entity1) and friend (usr1, usr2) are independent, which means there would be no edge connecting nodes like (entity1) and like (entity2) in the Markov network. As an example, if usrA likes fish and usrB likes football, as fish and football belong to different entity categories, we would treat these two predicates as independent.

Discriminative Training for MLN: We use the approach described in [82] where we assume that we have a priori knowledge about which predicates will be evidence and which ones will be queried. Instead of optimizing over all nodes along the graph, the system optimizes the probability of predicting the queried nodes given evidence nodes. This prunes a large number of branches. Let be the set of queried models and be evidence nodes, the system optimizes the conditional probability as follows:

(9)

where denotes all cliques with at least one node involving a query node.

4.2 Modeling Missing Values

A major challenge is missing values, for example in situations where users do not mention an entity; a user not mentioning one entity does not necessarily mean they do not like it. Consider the following situation:

like(A,soccer) (10)

Drawing this inference in this way is requires that (1) usrB indeed likes soccer (2) usrB explicitly mentions soccer in his or her posts. Satisfying both premises (especially the latter one) is a luxury. Inspired by common existing approaches to deal with missing data [44, 51], we treat users’ like/dislike preferences as latent variables, while what is observed is whether users explicitly mention their preferences in their posts. The latent variables and observed variables are connected via a binary distribution parameterized by a [0,1] variable , indicating how likely a user would be to report the correspondent entity in their posts.

For MLN, a brief illustration is shown in Figure 2. The conditional probability can be expressed by summing over latent variables. The system can be optimized by incorporating a form of EM algorithm into MLN [83].

For PSI, each entity is associated with an additional predicate mention (usr, entity), denoting the situation where any given user publishes posts about one specific entity. Predicate publish-entity(usr) comes with the following constraints :

  • like-entity(usr)publish-entity(usr)=0

  • dislike-entity(usr)publish-entity(usr)=0

which can be interpreted as saying that a user would mention his like or dislike towards an entity only if he likes or dislikes it.

Figure 2: (a) Standard Approach (b) Revised version with missing values in MLN.

4.3 Inference

Inference is performed on two settings: friend-observed and neigh-latent111111We draw on a similar idea in [47].. friend-observed addresses the leave-one-out testing to infer one specific attribute or relation given all the rest. friend-latent refers to a another scenario where some of the attributes (or other information) for multiple users along the network are missing and require joint inference over multiple values along the graph. Real world applications, where network information can be partly retrieved, likely fall in between.

Inference for the friend-observed setting is performed directly from the standard MLN and PSL inference framework, which is implemented using MCMC for MLN and MPE (Most Probable Explanation) for PSL. For the friend-latent setting, we need to jointly infer location attributes along the users. As the objective function for joint inference would be difficult to optimize (especially since inference on MLN is hard) and existing algorithms may not able to scale up to the size of network we consider, we turn to a greedy approach inspired by recent work [48, 68]: attributes are initialized from the logic network based on given attributes where missing values are not considered. Then for each user along the network, we iteratively re-estimate their attributes given the evidence both from her own attribute values and her friends by performing standard inference in MLN or PSL. In this way, highly confident predictions will be made based on individual features in the first round, then user-user relations would either support or contradict these decisions. We run 3 rounds of iterations. We expect friend-observed to yield better results than the friend-latent setting since the former benefits from gold network information [47].

5 Experiments

We now turn to our experiments on using global inference across the logic networks to augment the individual local detectors to infer user attributes, user relations and finally user preferences. These results are based on the datasets extracted in the previous section, where each user is represented with a series of extracted attribute values (e.g., like/dislike, location, gender) and users are connected along the social network. We use of the data as training corpus, reserving for testing, from which we respectively extract testing data for each relations, attribute, or preference, as described below.

In each case, our goal is to understand whether global probabilistic logical inference over the entire social network graph improves over baseline classifiers like SVNs that use only local features.

5.1 User Attributes: Location

The goal of location inference is to identify the US state the user tweets from, out of the 50 states. Evaluation is performed on the subset of users for which our rule-based approach in Section 2 identified a gold-standard location with high precision. We report on two settings. The friend-latent setting makes joint predictions for user locations across the network while the more precise friend-observed setting predicts the locations of each user given all other attributes, relations, and preferences. Baselines we employ include:

  • Random: Assign location attributed from distribution based on population121212http://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population.

  • Unified: Assign the most populated state in USA (California) to each user.

  • SVM and Naive Bayes: Train multi-class classifiers where features are the predicted extracted attributes and network information. Features we consider include individual information and network information. The former encodes the presence/absence of the entities that a user likes or dislikes, and job/education attributes (if information is included in the dataset). The latter includes

    • The proportion of friends that take one specific value for each attribute. Consider the attribute LiveIn-illinois, feature value for SVM is calculated as follows131313For probabilistic attributes (e.g., education), values are leveraged by weights. :

    • The presence/absence of spouse attribute (if spouse user is identified).

  • Only-network: A simplified version of the model which only relies on relations along the network.

  • Only-like: A simplified version of the model which only relies on individual attributes.

Model Acc Model Acc
Random 0.093 Unified 0.114
SVM 0.268 Naive Bayes 0.280
only-network (MLN) 0.272 only-like (MLN) 0.258
friend-observed (MLN) 0.342 friend-latent (MLN) 0.298
friend-observed (PSL) 0.365 friend-latent (PSL) 0.282
Table 3: Accuracy of different models for predicting location.
Form Value
17.8
2.8
7.2
4.2
Table 4: Examples for Locations.

The performances of the different models are illustrated in Table 3141414This is a 50-class classification problem;accuracy for random assignment without prior knowledge is 0.02.. As expected, friend-observed outperforms friend-latent, detecting locations with an accuracy of about 0.35. only-network and only-like

models, where evidence is partially considered, consistently underperform settings where evidence is fully considered. Logic networks, which are capable of capturing the complicated interaction between factors and features, yield better performance than traditional SVM and Naive Bayes classifiers.

Table 4 gives some examples based on conditional probability calculated from MLN, respectively correspond: (1) people from Illinois like Chicago Bears (2) People from Alabama like barbecue (3) People from hockey like hockey (4) People from North Dakota like Krumkake.

5.2 User Attributes: Gender

We evaluate gender based on a dataset of 10,000 users (half male, half female) drawn from the users whose gold standard gender was assigned with sufficiently high precision by the social-security informed system in Section 2. We only focus on neigh-observe setting. SVM baseline takes individual and network features as described in Section 5.1. Table 5 shows the results. Using the logic networks across all attributes, relations, and preferences, the accuracy of our algorithm is 0.772.

Model Pre Rec F1
MLN 0.772 0.750 0.761
PSL 0.742 0.761 0.751
SVM 0.712 0.697 0.704
Table 5: Performances for Male/Female prediction.

Of course the performance of the algorithm could very likely be even higher if we were to additionally incorporating features designed directly for the gender ID task (such as entities mentioned, links, and especially the wide variety of writing style features used in work such as [12], which achieves gender ID accuracies of 0.85 on a different dataset). Nonetheless, the fact that global probabilistic inference over the network of attributes and relations achieves such high accuracies without any such features points to the strength of the network approach.

Table 6 gives some examples about gender preference inferred from MLN. As can be seen, males prefer sports while females prefer fashions (as expected). Females emphasize more on food and movies than males, but not significantly.

Form Value
16.9
18.0
2.1
1.6
Table 6: Examples for Genders.

5.3 Predicting Relations Between Users

We tested relation prediction on the detection of the three relations defined in section 2: friend, spouse and LiveInSameLocation. Positive training data is selected from pairs of users among whom one specific type of relation holds while random user pairs are used as negative examples. We weighted toward negative examples to match the natural distribution Statistics are shown in Table 10.

Relation Positive Negative
Friend 20,000 80,000
Spouse 1,000 5,000
LiveInSameLocation 5,000 20,000
Table 7: Dataset statistics for relation prediction.

For relation evaluation, we only focus on the neigh-observe setting. Decisions are made by comparing the conditional probability that a specific relation holds given other types of information, for example Pr(Spouse(A,B)|) and 1-Pr(Spouse(A,B)|). Baselines we employ include:

  • SVM: We use co-occurrence of attributes as features:

    For LiveInSameLocation prediction, the location identification classifier without any global information naturally constitutes a baseline, where two users are viewed as living in the same location if classifiers trained in 5.2 assigned them the same location labels.

  • Random: Assign labels randomly based on the proportion of positive examples. We report the theoretical values of precision and recall, which are given by:

The performance of the different approaches are reported in Table 10. As can be seen, the logic models consistently yield better performances than SVMs on relation prediction tasks due to their power in leveraging the global interactions between different sources of evidence.

Relation Model Pre Rec F1
Friend MLN 0.580 0.829 0.682
PSL 0.531 0.850 0.653
SVM 0.401 0.745 0.521
Random 0.200 0.200 0.2
Spouse MLN 0.680 0.632 0.655
PSL 0.577 0.740 0.648
SVM 0.489 0.600 0.539
Random 0.167 0.167 0.167
LiveInSame
Location
MLN 0.592 0.704 0.643
PSL 0.650 0.741 0.692
SVM 0.550 0.721 0.624
SVM
(location)
0.504 0.695 0.584
Random 0.200 0.200 0.200
Table 8: Performances for Relation Prediction. Performances about random are theoretical results.

5.4 Predicing Preference: Likes or Dislikes

Evaluating our ability to detect user preferences is complex since, as mentioned in Section 4.2, we don’t gold-standard labels of Twitter users’ true attitudes towards different entities. That is, we don’t actually know what users actually like: only what they say they like. We therefore evaluate our ability to detect what users say about an entity.

We evaluate two distinct tasks, beginning with the simpler: given that the user talked about an entity, was their opinion positive or negative.

We then proceed to the much more difficult task of predicting whether a user will talk about an entity at all, and if so whether her opinion will be positive or negative.

In both tasks our task is to estimate without using the text of the message. This is because our goal is to understand how useful the social network structure is alone in solving the problem. This information could then easily be combined with standard sentiment-analysis techniques in future work.

Evaluations are performed under both the friend-observed setting and the friend-latent setting.

5.4.1 Predicting Like/Dislike

We begin with the scenario in which we know that an entity is already mentioned by user and we try to predict a user’s attribute towards without looking at text-level evidence. The goal of this experiment is to predict sentiment (e.g., whether one likes Barack Obama) given other types of attributes of the user himself (e.g., where he lives) or his network (e.g., whether his friends hate Mitt Romney) but without using sentiment-analysis features of the text itself.

We created a test set in which the like/dislike preferences are expressed toward entities (extracted in Section 2) that are frequent from our database, which has a total of 92 distinguished entities (e.g., BarackObama, New York Knicks). We extracted 1000 like examples and 1000 dislike examples (e.g., like-BarackObama(user), dislike-BarackObama(user)).

Predictions are make by comparing Pr( like-entity (user,entity)|) and Pr( dislike-entity (user,entity)| ). We extracted gold standards for each data point, with 0.5 random guess accuracy. Evaluations are performed in terms of prediction and recall. We only consider the neigh-latent setting.

The Baselines we employ include:

  • SVM: We train binary SVM classifiers to decide, for a specific entity , whether a user likes/dislikes . Features include individual attributes values (e.g., like/dislike, location, gender, etc) and network information (attributes from his friends along the network)

  • Collaborative Filtering (CF): CF [18, 25, 33, 35] accounts for a popular approach in recommendation system, which utilizes the information of the user-item matrix for recommendations. The key idea of CF is to recommend similar items to similar users. We view the like/dislike entity prediction as entity recommendation problem and adopt the approach described in [80]

    by constructing user-user similarity matrix from weighted cosine similarity calculated from shared attributes and network information. Entity-entity similarity is computed based on entity embedding (described in Section 2). As in

    [80], a regression model is trained to fill out value in user-entity matrix indicating whether a specific user likes/hates one specific entity. Prediction is then made based on a weighted nearest neighbor algorithm.

Results are reported in Table 9. As can be seen, MLN and PSL outperform other baselines.

Model Pre Rec F1
MLN 0.802 0.764 0.782
PSL 0.810 0.772 0.791
SVM 0.720 0.737 0.728
CF 0.701 0.694 0.697
Table 9: Performances for Like/Dislike prediction.

5.4.2 Predicting Mentions of Likes/Dislikes

We are still confronted with the missing value problem of Section 4.2, where we don’t know what users actually believe, so we can only try to predict what they will say they believe. But where the previous section assumed we knew that the user talked about an entity and just predicted the sentiment, we now turn to the much more difficult task of predicting both whether a user will mention an entity and what the users attitude toward the entity is. Evaluations are again performed under the friend-observed setting and the friend-latent setting.

We construct the testing dataset using a random sample of 2,000 users with an average number of 3.2 like/dislike entities mentioned per user (a total number of 4,300 distinct entities). Baselines we employ include:

  • Random: Estimate the overall popularity of a specific entity being liked by the whole population. The probability is given by:

    The decision is made by sampling a variable from a binary distribution parameterized by .

  • SVM and Naive Bayes: We train SVM and Naive Bayes classifiers to decide whether a specific user would express his like/dislike attitude towards a specific entity. Features include individual attributes values and network information (For feature details, see Section 5.1).

  • Collaborative Filtering (CF): As described in the previous section.

Performances are evaluated in terms of precision and recall, reported in Table 10.

Note that predicting like/dislike mention is an extremely difficult task, since users tweet about only a very very small percentage of all the entities they like and dislike in the world. Predicting which ones they will decide to talk about is a difficult task requiring much more kinds of evidence about the individual and the network that our system has access to.

Nonetheless, given the limited information we have at hand, and considering the great number of entities, our proposed model does surprisingly well, with about precision and recall, significantly outperforming Collaborative Filtering, SVM and Naive Bayes.

Model Pre Rec F1
Random <0.001 <0.01 <0.01
SVM 0.023 0.037 0.028
Naive Bayes 0.018 0.044 0.025
CF 0.045 0.060 0.051
friend-latent (MLN) 0.054 0.066 0.059
friend-latent (PSL) 0.067 0.061 0.064
friend-observed (MLN) 0.072 0.107 0.086
friend-observed (PSL) 0.075 0.120 0.092
Table 10: Performances of different models for like/dislike mention prediction.
logic form probability from MLN description
friend(A,B) friend(B,C) friend(A,C) 0.082 friends of friends are friends
couple(A,B) friend(B,C) friend(A,C) 0.127 one of a couple and the other’s friend are friends
friend(A,B) lke-sports(A) 0.024 friend of a sports fan likes sports
friend(A,B) lke-food(A) 0.018 friend of a food fan likes food
friend(A,B) lke-fashion(A) 0.030 friend of a fashion fan likes fashion
couple(A,B) lke-sports(A) 0.068 wife/husband of a sports fan likes sports
couple(A,B) lke-food(A) 0.085 wife/husband of a food fan likes food
friend(A,B) liveinSameplace(A,B) 0.160 friends live in the same location
couple(A,B) liveinSameplace(A,B) 0.632 couple live in the same location
Work-In-IT-company(A) like-electronic-device 0.242 people work in IT companies like electronic devices
Student(A) lke-sports(A) 0.160 student users like sports
Ivy-Student(A) lke-sports(A) 0.125 student users from Ivy schools like sports
Spouse(A,B) Friend(A,B) 0.740 couples are friends
Table 11: Examples of inference probability from the proposed system.

6 Related Work

This work is related to four different research areas.

Information Extraction on Social Media : Much work has been devoted to automatic extraction of well-structured information profiles from online social media, which mainly fall into two major levels: at public level [49, 50, 90] or at user level. The former includes public event identification [19], event tracking [67] or event-referring expression extraction [74]. The latter focus on user studies, examining users’ interests [3], timeline [46], personal events [47] or individual attributes such as age [70, 69], gender [12], political polarity [13], locations [78], jobs and educations [48], student information (e.g., major, year of matriculation) [58].

The first step of proposed approach highly relies on attribute extraction algorithm described in [48] which extracts three categories of user attributes (i.g., education, job and spouse) for a given user based on their posts. [48] gathers training data based on the concept of distant supervision where Google+ treated used as “knowledge base" to provide supervision. The algorithm returns the probability of whether the following predicates hold: Work-in(usr,entity) (job), Study-at(usr,entity) (education) and Spouse(usr1,usr2) (spouse).

Homophily: Our work is based on the fundamental homophily property of online users [54], which assumes that people sharing more attributes or background have a higher chance of becoming friends in social media151515summarized by the proverb “birds of a feather flock together" [2]., and that friends (or couples, or people living in the same location) tend to share more attributes. Such properties have been harnessed for applications like community detection [92] or friend recommendation [27].

Data Harvesting: The techniques adopted in like/dislike attribute extraction are related to a strand of work in data harvesting/information extraction, the point of which is to use some seeds to harvest some data, which is used to learn additional rules or patterns to harvest more data [16, 31, 40, 41, 73]. Distant supervision is another methodology for data harvesting [15, 28, 57] that relies on structured data sources as a source of supervision for data harvesting from raw text.

Logic/Relational Reasoning: Logic reasoning, usually based on first-order logic representations, can be tracked back to the early days of AI [59, 76], and has been adequately explored since then (e.g., [6, 14, 26, 32, 42, 45, 71, 72, 77, 81, 86, 87, 88, 89]). A variety of reasoning models have been proposed, based on ideas or concepts from the fields of graphical models, relational logic, or programming languages [7, 8, 60]

, each of which has it own generalization capabilities in terms of different types of data. Frameworks include Stochastic Logic Programs

[61] which combines logic programming and log-linear models, Probabilistic Relational Networks [23]

which incorporates Bayesian networks for reasoning, Relational Markov Networks

[85] that uses dataset queries as cliques and model the state of clique in a Markov network, Relational Dependency Networks [62] which combines Bayes networks and Markov networks, and probabilistic similarity logic [7] which jointly considers probabilistic reasoning about similarities and relational structure.

A great number of applications benefit from logical reasoning, including natural language understanding (e.g., [6]), health modeling [21], group modeling [29], web link based clustering [22], object identification [20], trust analysis [30], and many more.

7 Conclusion and Discussion

In this work, we propose a framework for applying probabilistic logical reasoning to inference problems on on social networks. Our two-step procedure first extracts logical predicates, each associated with a probability, from social networks, and then performs logical reasoning. We evaluated our system on predicting user attributes (gender, education, location), user relations (friend, spouse, same-location), and user preferences (liking or disliking different entities). Our results show that using probabilistic logical reasoning over the network improves the accuracy of the resulting predictings, demonstrating the effectiveness of the proposed framework.

Of course the current system is particularly weak in recall, since many true user attributes or relations are simply never explicitly expressed on platforms like Twitter. Also, the “gold-standard" first-order logics extracted are not really gold-standard. One promising perspective is to integrate user information from different sorts of online social media. Many websites directly offer gold-standard attributes; Facebook contains user preference for movies, books, religions, musics or locations; LinkedIn offers comprehensive professional information. Combining these different types of information will offer more evidence for decision making.

References

  • [1] A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau. Sentiment analysis of twitter data. In Proceedings of the Workshop on Languages in Social Media, pages 30–38. Association for Computational Linguistics, 2011.
  • [2] F. Al Zamal, W. Liu, and D. Ruths. Homophily and latent attribute inference: Inferring latent attributes of twitter users from neighbors. In ICWSM, 2012.
  • [3] N. Banerjee, D. Chakraborty, K. Dasgupta, S. Mittal, A. Joshi, S. Nagar, A. Rai, and S. Madan. User interests in social media sites: an exploration with micro-blogs. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1823–1826. ACM, 2009.
  • [4] I. Beltagy, K. Erk, and R. Mooney. Probabilistic soft logic for semantic textual similarity. Proceedings of Association for Computational Linguistics (ACL-14), 2014.
  • [5] Y. Bengio, H. Schwenk, J.-S. Senécal, F. Morin, and J.-L. Gauvain. Neural probabilistic language models. In Innovations in Machine Learning, pages 137–186. Springer, 2006.
  • [6] J. Berant, A. Chou, R. Frostig, and P. Liang. Semantic parsing on freebase from question-answer pairs. In EMNLP, pages 1533–1544, 2013.
  • [7] M. Brocheler, L. Mihalkova, and L. Getoor. Probabilistic similarity logic. arXiv preprint arXiv:1203.3469, 2012.
  • [8] M. Broecheler and L. Getoor. Computing marginal distributions over continuous markov networks for statistical relational learning. In Advances in Neural Information Processing Systems, pages 316–324, 2010.
  • [9] J. D. Burger, J. Henderson, G. Kim, and G. Zarrella. Discriminating gender on twitter. In

    Proceedings of the Conference on Empirical Methods in Natural Language Processing

    , pages 1301–1309. Association for Computational Linguistics, 2011.
  • [10] Z. Cheng, J. Caverlee, and K. Lee. You are where you tweet: a content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management, pages 759–768. ACM, 2010.
  • [11] Y. Choi, C. Cardie, E. Riloff, and S. Patwardhan. Identifying sources of opinions with conditional random fields and extraction patterns. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 355–362. Association for Computational Linguistics, 2005.
  • [12] M. Ciot, M. Sonderegger, and D. Ruths. Gender inference of twitter users in non-english contexts. In EMNLP, pages 1136–1145, 2013.
  • [13] M. Conover, J. Ratkiewicz, M. Francisco, B. Gonçalves, F. Menczer, and A. Flammini. Political polarization on twitter. In ICWSM, 2011.
  • [14] V. S. Costa, D. Page, M. Qazi, and J. Cussens. Clp (bn): Constraint logic programming for probabilistic knowledge. In

    Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

    , pages 517–524. Morgan Kaufmann Publishers Inc., 2002.
  • [15] M. Craven, J. Kumlien, et al. Constructing biological knowledge bases by extracting information from text sources. In ISMB, volume 1999, pages 77–86, 1999.
  • [16] D. Davidov, A. Rappoport, and M. Koppel. Fully unsupervised discovery of concept-specific relationships by web mining. 2007.
  • [17] C. A. Davis Jr, G. L. Pappa, D. R. R. de Oliveira, and F. de L Arcanjo. Inferring the location of twitter messages based on user relationships. Transactions in GIS, 15(6):735–751, 2011.
  • [18] S. Debnath, N. Ganguly, and P. Mitra. Feature weighting in content based recommendation system using social network analysis. In Proceedings of the 17th international conference on World Wide Web, pages 1041–1042. ACM, 2008.
  • [19] Q. Diao, J. Jiang, F. Zhu, and E.-P. Lim. Finding bursty topics from microblogs. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 536–544. Association for Computational Linguistics, 2012.
  • [20] P. Domingos. Multi-relational record linkage. In In Proceedings of the KDD-2004 Workshop on Multi-Relational Data Mining. Citeseer, 2004.
  • [21] S. Fakhraei, B. Huang, L. Raschid, and L. Getoor. Network-based drug-target interaction prediction with probabilistic soft logic.
  • [22] G. W. Flake, S. Lawrence, and C. L. Giles. Efficient identification of web communities. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 150–160. ACM, 2000.
  • [23] N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In IJCAI, volume 99, pages 1300–1309, 1999.
  • [24] A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, pages 1–12, 2009.
  • [25] M. Goeksel and C. P. Lam. System and method for utilizing social networks for collaborative filtering, Mar. 30 2010. US Patent 7,689,452.
  • [26] B. Goertzel, C. Pennachin, and N. Geisweiller. Probabilistic logic networks. In Engineering General Intelligence, Part 2, pages 275–291. Springer, 2014.
  • [27] I. Guy, N. Zwerdling, I. Ronen, D. Carmel, and E. Uziel. Social media recommendation based on people and tags. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 194–201. ACM, 2010.
  • [28] R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, and D. S. Weld. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 541–550. Association for Computational Linguistics, 2011.
  • [29] B. Huang, S. H. Bach, E. Norris, J. Pujara, and L. Getoor. Social group modeling with probabilistic soft logic. In NIPS Workshop on Social Network and Social Media Analysis: Methods, Models, and Applications, 2012.
  • [30] B. Huang, A. Kimmig, L. Getoor, and J. Golbeck. Probabilistic soft logic for trust analysis in social networks. In International Workshop on Statistical Relational AI, pages 1–8, 2012.
  • [31] S. P. Igo and E. Riloff. Corpus-based semantic lexicon induction with web-based corroboration. In

    Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics

    , pages 18–26. Association for Computational Linguistics, 2009.
  • [32] M. Jaeger et al. Probabilistic reasoning in terminological logics. KR, 94:305–316, 1994.
  • [33] M. Jamali and M. Ester. A matrix factorization technique with trust propagation for recommendation in social networks. In Proceedings of the fourth ACM conference on Recommender systems, pages 135–142. ACM, 2010.
  • [34] T. Joachims. Making large scale svm learning practical. 1999.
  • [35] H. Kautz, B. Selman, and M. Shah. Referral web: combining social networks and collaborative filtering. Communications of the ACM, 40(3):63–65, 1997.
  • [36] S.-M. Kim and E. Hovy. Extracting opinions, opinion holders, and topics expressed in online news media text. In Proceedings of the Workshop on Sentiment and Subjectivity in Text, pages 1–8. Association for Computational Linguistics, 2006.
  • [37] A. Kimmig, S. Bach, M. Broecheler, B. Huang, and L. Getoor. A short introduction to probabilistic soft logic. In Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications, pages 1–4, 2012.
  • [38] I. Konstas, V. Stathopoulos, and J. M. Jose. On social networks and collaborative recommendation. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 195–202. ACM, 2009.
  • [39] E. Kouloumpis, T. Wilson, and J. Moore. Twitter sentiment analysis: The good the bad and the omg! ICWSM, 11:538–541, 2011.
  • [40] Z. Kozareva and E. Hovy. Learning arguments and supertypes of semantic relations using recursive patterns. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1482–1491. Association for Computational Linguistics, 2010.
  • [41] Z. Kozareva and E. Hovy. Not all seeds are equal: Measuring the quality of text mining seeds. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 618–626. Association for Computational Linguistics, 2010.
  • [42] T. Kwiatkowski, E. Choi, Y. Artzi, and L. Zettlemoyer. Scaling semantic parsers with on-the-fly ontology matching. 2013.
  • [43] J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 2001.
  • [44] S. L. Lauritzen. The em algorithm for graphical association models with missing data. Computational Statistics & Data Analysis, 19(2):191–201, 1995.
  • [45] M. Lewis and M. Steedman. Combined distributional and logical semantics. TACL, 1:179–192, 2013.
  • [46] J. Li and C. Cardie. Timeline generation: tracking individuals on twitter. In Proceedings of the 23rd international conference on World wide web, pages 643–652. International World Wide Web Conferences Steering Committee, 2014.
  • [47] J. Li, A. Ritter, C. Cardie, and E. Hovy. Major life event extraction from twitter based on congratulations/condolences speech acts. In Proceedings of Empirical Methods in Natural Language Processing, 2014.
  • [48] J. Li, A. Ritter, and E. Hovy. Weakly supervised user profile extraction from twitter. ACL, 2014.
  • [49] C. X. Lin, B. Zhao, Q. Mei, and J. Han. Pet: a statistical model for popular events tracking in social communities. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 929–938. ACM, 2010.
  • [50] J. Lin, R. Snow, and W. Morgan. Smoothing techniques for adaptive online language models: topic tracking in tweet streams. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 422–429. ACM, 2011.
  • [51] R. J. Little and D. B. Rubin. Statistical analysis with missing data. 2002.
  • [52] F. Liu and H. J. Lee. Use of social network information to enhance collaborative filtering performance. Expert Systems with Applications, 37(7):4772–4778, 2010.
  • [53] D. Lowd and P. Domingos. Efficient weight learning for markov logic networks. In Knowledge Discovery in Databases: PKDD 2007, pages 200–211. Springer, 2007.
  • [54] M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: Homophily in social networks. Annual review of sociology, pages 415–444, 2001.
  • [55] T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, and S. Khudanpur. Recurrent neural network based language model. In INTERSPEECH, pages 1045–1048, 2010.
  • [56] T. Mikolov, S. Kombrink, L. Burget, J. Cernocky, and S. Khudanpur. Extensions of recurrent neural network language model. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 5528–5531. IEEE, 2011.
  • [57] M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pages 1003–1011. Association for Computational Linguistics, 2009.
  • [58] A. Mislove, B. Viswanath, K. P. Gummadi, and P. Druschel. You are who you know: inferring user profiles in online social networks. In Proceedings of the third ACM international conference on Web search and data mining, pages 251–260. ACM, 2010.
  • [59] R. Montague. Universal grammar. Theoria, 36(3):373–398, 1970.
  • [60] S. Muggleton and L. De Raedt. Inductive logic programming: Theory and methods. The Journal of Logic Programming, 19:629–679, 1994.
  • [61] S. Muggleton et al. Stochastic logic programs. Advances in inductive logic programming, 32:254–264, 1996.
  • [62] J. Neville and D. Jensen. Relational dependency networks. The Journal of Machine Learning Research, 8:653–692, 2007.
  • [63] F. Niu, C. Ré, A. Doan, and J. Shavlik. Tuffy: Scaling up statistical inference in markov logic networks using an rdbms. Proceedings of the VLDB Endowment, 4(6):373–384, 2011.
  • [64] O. Owoputi, B. O’Connor, C. Dyer, K. Gimpel, N. Schneider, and N. A. Smith. Improved part-of-speech tagging for online conversational text with word clusters. In HLT-NAACL, pages 380–390, 2013.
  • [65] A. Pak and P. Paroubek. Twitter as a corpus for sentiment analysis and opinion mining. In LREC, 2010.
  • [66] M. Pennacchiotti and A.-M. Popescu. A machine learning approach to twitter user classification. ICWSM, 11:281–288, 2011.
  • [67] A.-M. Popescu, M. Pennacchiotti, and D. Paranjpe. Extracting events and event descriptions from twitter. In Proceedings of the 20th international conference companion on World wide web, pages 105–106. ACM, 2011.
  • [68] K. Raghunathan, H. Lee, S. Rangarajan, N. Chambers, M. Surdeanu, D. Jurafsky, and C. Manning. A multi-pass sieve for coreference resolution. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 492–501. Association for Computational Linguistics, 2010.
  • [69] D. Rao and D. Yarowsky. Detecting latent user properties in social media. In Proc. of the NIPS MLSN Workshop, 2010.
  • [70] D. Rao, D. Yarowsky, A. Shreevats, and M. Gupta. Classifying latent user attributes in twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents, pages 37–44. ACM, 2010.
  • [71] M. Richardson and P. Domingos. Markov logic networks. Machine learning, 62(1-2):107–136, 2006.
  • [72] S. Riedel, L. Yao, A. McCallum, and B. M. Marlin. Relation extraction with matrix factorization and universal schemas. 2013.
  • [73] E. Riloff, R. Jones, et al. Learning dictionaries for information extraction by multi-level bootstrapping. In AAAI/IAAI, pages 474–479, 1999.
  • [74] A. Ritter, O. Etzioni, S. Clark, et al. Open domain event extraction from twitter. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1104–1112. ACM, 2012.
  • [75] A. Ritter, L. Zettlemoyer, Mausam, and O. Etzioni. Modeling missing data in distant supervision for information extraction. TACL, 1:367–378, 2013.
  • [76] J. A. Robinson. A machine-oriented logic based on the resolution principle. Journal of the ACM (JACM), 12(1):23–41, 1965.
  • [77] C. C. D. Roth. Feature extraction languages for propositionalized relational learning.
  • [78] A. Sadilek, H. Kautz, and J. P. Bigham. Finding your friends and following them to where you are. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 723–732. ACM, 2012.
  • [79] H. Saif, Y. He, and H. Alani. Semantic sentiment analysis of twitter. In The Semantic Web–ISWC 2012, pages 508–524. Springer, 2012.
  • [80] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web, pages 285–295. ACM, 2001.
  • [81] S. Schoenmackers, O. Etzioni, D. S. Weld, and J. Davis. Learning first-order horn clauses from web text. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1088–1098. Association for Computational Linguistics, 2010.
  • [82] P. Singla and P. Domingos. Discriminative training of markov logic networks. In AAAI, volume 5, pages 868–873, 2005.
  • [83] P. Singla and P. Domingos. Entity resolution with markov logic. In Data Mining, 2006. ICDM’06. Sixth International Conference on, pages 572–582. IEEE, 2006.
  • [84] C. Tang, K. Ross, N. Saxena, and R. Chen. What’s in a name: A study of names, gender inference, and gender behavior in facebook. In Database Systems for Adanced Applications, pages 344–356. Springer, 2011.
  • [85] B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, pages 485–492. Morgan Kaufmann Publishers Inc., 2002.
  • [86] W. Y. Wang, K. Mazaitis, and W. W. Cohen. Programming with personalized pagerank: A locally groundable first-order probabilistic logic. Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM 2013), 2013.
  • [87] W. Y. Wang, K. Mazaitis, and W. W. Cohen. Proppr: Efficient first-order probabilistic logic programming for structure discovery, parameter learning, and scalable inference. Proceedings of the AAAI 2014 Workshop on Statistical Relational AI (StarAI 2014), 2014.
  • [88] W. Y. Wang, K. Mazaitis, and W. W. Cohen. Structure learning via parameter learning. Proceedings of the 23rd ACM International Conference on Information and Knowledge Management (CIKM 2014), 2014.
  • [89] W. Y. Wang, K. Mazaitis, N. Lao, T. Mitchell, and W. W. Cohen. Efficient inference and learning in a large knowledge base: Reasoning with extracted information using a locally groundable first-order probabilistic logic. in progress, 2014.
  • [90] J. Wiebe, T. Wilson, and C. Cardie. Annotating expressions of opinions and emotions in language. Language resources and evaluation, 39(2-3):165–210, 2005.
  • [91] B. Yang and C. Cardie. Joint inference for fine-grained opinion extraction. In ACL (1), pages 1640–1649, 2013.
  • [92] J. Yang and J. Leskovec. Overlapping community detection at scale: a nonnegative matrix factorization approach. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 587–596. ACM, 2013.