Due to the rapid progress in the field of AI, there have been widespread various discussions and concerns about the threats and benefits of AI. These discussions include – improving the everyday lives of individuals (https://goo.gl/ViLdgV), ethical issues associated with the intelligent systems (https://goo.gl/5KlmXk), etc. Social media platforms are ideal repositories of opinions and discussion threads. Twitter is one such popular platform where individuals post their statuses, opinions and perceptions about the ongoing issues in the society [Java et al.2007, Naaman, Boase, and Lai2010, Yang and Counts2010]. Given the ease of finding individual’s perceptions and opinions on this platform, this paper investigates the posts about AI shared on Twitter. Specifically, we compare and contrast the perceptions of users from two categories – general AI-Tweeters (AIT) and expert AI-Tweeters (EAIT). We believe that the findings from this analysis can help research funding agencies, organizations, industries, and especially the AAAI community about the public perceptions of AI.
There have been earlier efforts to understand public perceptions of AI. Recent work by Fast et. al [Fast and Horvitz2016]
conducts a longitudinal study of articles published on AI in New York Times between January 1986 and May 2016 have evolved over the years. This study revealed that from 2009 the discussion on AI has sharply increased and is more optimistic than pessimistic. It also found that fears about losing control over AI systems have been increasing in the recent years. Another recent survey[Gaines-Ross2016] conducted by the Harvard Business Review on individuals who do not have any background in technology, stated the positive perceptions of these individuals toward AI. In contrast, despite online social media platforms being the main channels for communicating personal opinions [Java et al.2007, Naaman, Boase, and Lai2010, Yang and Counts2010], there is no existing work on how users think and what users share about AI on these platforms. We aim to analyze the perceptions of individuals as manifested in their posts shared on Twitter.
Towards this goal, we attempt to answer 5 important questions through a thorough quantitative and comparative investigation of the posts shared on Twitter. 1) What are the insights that could be learned by characterizing the individuals and their interests who are making AI-related tweets? 2) What is the Twitter engagement rate for the AI-tweets? 3) Are the posts about AI optimistic or pessimistic? 4) What are the most interesting topics of discussion about AI to the users? 5) What can we learn about the trends of semantic co-occurrences about AI? We address each of these questions in the next few sections.
Our analysis reveals intriguing differences between the posts shared by AIT and EAIT on Twitter. Specifically, this analysis reveals five interesting findings about the perceptions of individuals who are using Twitter to share their opinions about AI. Firstly, users from both the categories are emotionally positive (or optimistic) towards the progress of AI. Secondly, even though users are positive overall, expert users are more negative than AIT. Thirdly, from the topics extracted from the tweets, expert users share large percentage of tweets about their personal news. Fourth, the effects of automation on the future are of predominant concern to AIT. Lastly, when we sub-categorized EAIT, the emotional analysis on AI-related tweets revealed that academicians are less positive and more social than students and industry professionals.
In the next section, we describe the process of collecting data from the two categories of users considered in this paper. There after, we compare and contrast the perceptions of AIT and EAIT on each of the 5 research questions we posed earlier. We then present the summarization of our findings from the investigation presented in this paper in conclusions.
2 Data Collection
2.1 AI-Tweeters (AIT)
We employ the official API of Twitter111https://dev.twitter.com/overview/api along with a frequency-based hashtag selection approach to crawl data from the general Twitter users who tweet about AI. We first identify an appropriate set of hashtags that focus on the artificial intelligence in social media. In our case, these are – #ai and #artificialintelligence. With these seed hashtags, we crawled 2 million unique tweets and then iteratively extracted hashtags to identify the most frequent co-occurring hashtags with the seed set. We then remove non-technical hashtags (for example: #trump, #politics, etc) from this sorted hashtag list. The top-15 co-occurring hashtags after this pre-processing are shown in Table 1.
|1. #ai||2. #artificialintelligence||3. #machinelearning|
|4. #bigdata||5. #iot||6. #deeplearning|
|7. #robotics||8. #datascience||9. #cybersecurity|
|10. #vr||11. #ar||12. #nlp|
|13. #ux||14. #algorithms||15. #socialmedia|
We used the top-4 hashtags from this list: #ai, #artificialintelligence, #machinelearning and #bigdata as the final hashtag set to crawl a set of 2.3 million tweets. We found that the set of tweets obtained using these 4 hashtags are a superset of all the tweets crawled by utilizing the remaining hashtags presented in Table 1. Each tweet in this dataset is public and contains the following post-related information:
number of favorites received
number of times it is retweeted
the url links shared as a part of it
text of the tweet including the hashtags
geolocation if tagged
A tweet may contain more than a single hashtag. From this set of tweets, we remove the redundant tweets that are attached to more than one of these four hashtags considered. This resulted in a dataset of 0.2 million tweets that are unique and are posted by a unique set of 33K users. Due to the download limit of the Twitter API, all the tweets in our dataset are from February 2017. None of the tweets we crawled were tagged with a geolocation.
|Mean||11.033 (86.9)||11.10 (84)|
|Median||11.0 (91)||11.0 (89)|
|Min.||1 (0)||1 (0)|
|Max.||30 (140)||52 (140)|
2.2 Expert AI-Tweeters (EAIT)
We manually compile a list of AI experts whom we consider as a seed set to crawl the expert Twitter users. From this seed set, we crawl their friends (users they are following) who are also experts in AI. Through the snowballing approach on the friends list, we compile the EAIT list that contains 9851 expert users. Using the user biography, we label a given user as a EAIT based on these two conditions:
No vocabulary related to politics, business, news media mentioned – for example, reporter, organization, marketing, blockchain, breaking, etc;
Vocabulary related to AI is used – for example, machinelearning, ai, vision, researcher, #ai, etc.
The vocabulary used in this labeling is composed by leveraging the AI vocabulary compiled here: https://goo.gl/ApCbnu. By utilizing the seed set of 100 EAITs, we finally obtain a list of 9851 users. We then use a keyword-based approach to classify this set of users. This classification reveals that 35% of EAIT are
By utilizing the seed set of 100 EAITs, we finally obtain a list of 9851 users. We then use a keyword-based approach to classify this set of users. This classification reveals that 35% of EAIT areindustry professionals, 10% are academicians, 6% are students and rest are unclassified. For example, if a user mentions the keyword ‘student’ as part of their biography, that user is categorized as a student. To categorize a user as an academician, we search for keywords such as ‘professor’, ‘faculty’, ‘lecturer’, ‘teacher’, etc. Similarly, we label an expert user as an industry professional if the Twitter biography contains some of the keywords such as ‘engineer’, ‘scientist’, ‘director’, ‘developer’, ‘founder’, etc. 0.06% of the tweets we crawled are tagged with geolocation.
3 Characterization of Users
3.1 Influence attributes
Before we delve deep to investigate the research questions, we present few details about the demographics of users from AIT and EAIT. To understand the differences between the two types of users, we first focus on the influence attributes. The influence attributes we consider are – #statuses shared, #followers, #friends, #favorites. These attributes provide a useful perspective about the activity of users and how their tweets are influencing other Twitter users. To study these attributes, we plot the logarithmic frequencies of these four attributes for AIT and EAIT in Figure 1.
Figure 1 shows that from the perspective of sharing and favoriting tweets, both sets of users are active on Twitter but the general users post relatively higher percentage of tweets than EAIT. AIT share more number of statuses than favoriting other tweets. However, in certain cases as shown in Figure 1, EAIT favorites more or less the same number of tweets as sharing the tweets. When we consider the other influence attributes – followers and friends, on an average both sets of users have large number of followers than friends (users you are following). EAIT has relatively larger number of followers compared to AIT. The posts shared by AIT are relatively larger than the number of followers they have. In contrast, for EAIT, the number of followers are many times larger than the statuses they share. Existing literature [Kwak, Chun, and Moon2011, Kwak, Moon, and Lee2012] shows that the number of retweets and favorites are correlated with followers. This plot shows that the expertise of a user in AI is directly proportional to the number of followers.
|User Category||Geographical Locations|
|AIT||USA (3.4%), India (2.8%), CA (2.6%), France (2.4%), England (1.9%), NY (1.8%), UK (1.6%), London (1.4%), Germany (0.9%), Paris (0.9%)|
|EAIT||CA (9.7%), NY (4.5%), USA (3.2%), England (2.8%), France (2.7%), MA (2.2%), UK (2.1%), London (1.9%), SF (1.8%), Germany (1.7%)|
Table 2 compares the statistics about the length of posts made by AIT and EAIT. On an average, tweets made by EAIT are longer compared to the tweets posted by AIT. We then focus on the particulars of the users’ professional background and geographical location that are obtained from their profile biographies. 25.1% of users who are categorized as AIT did not provide their geographical location as part of their profile. Where as, only 15.5% of users categorized as EAIT did not state their geographical location. Table 3 shows that for the AIT category, in the top-10 specification of the locations, 6.77% are from Europe, 7.74% of users are from United States (14% more number of users than from Europe) and 2.8% are from India. For the EAIT category, 11.17% from Europe, 21.4% of users are from United States (almost 91.5% more number of users compared to Europe) and 0% from India. Large percentage of experts are from Europe and United States where as large percentage of non-experts talking about AI happen to be from India.
|AIT||manager, entrepreneur, consultant, founder, developer, engineer, writer, author, blogger, strategist|
|EAIT||scientist, student, researcher, engineer, professor, cofounder, ceo, founder, director, entrepreneur|
We conduct a unigram-based analysis of the profiles of the users to obtain their professional background. Table 4 shows that based on the frequencies of professions stated by users on Twitter, majority of the Twitter users contributing to AI-related tweets are pursuing careers in technology.
3.2 Overall Topics of Interest
Since we are characterizing the users, to examine their interests in general, we crawl the most recent 100 posts shared by these users. These posts may talk about AI and non-AI topics. We believe that extracting latent topics over the users’ timeline irrespective of the type of post can help measure the level of users’ interest in technology that can indirectly acts as a metric to understand their perceptions about AI. We extract the topics from the posts made by users using the Twitter LDA package [Zhao et al.2011].
As mentioned earlier, there are a total of 33K unique set of users who contributed towards our dataset. We crawl their recent tweets to extract the topics. We empirically decided to extract 5 topics and their percentage distribution for each individual’s tweets. We then aggregate all the distributions of users across these topics and the percentage distributions are shown in Figure 2. The topic distributions show that large percentage of users are interested in business analytics and then share 20% of the average number of tweets made by AIT are about their personal news.
We conduct a similar investigation on the tweets posted by experts. These topics reveal that experts post equal percentages of posts about their personal news, technical implementations of AI systems. These topics are different from the topics focused by AIT. The pie chart shown in Figure 3 reveals that more than 77% of the tweets posted by experts on Twitter are about technology. However, EAIT share significant percentage of personal opinions and statuses where as AIT post the least percentage of tweets about technology.
4 Twitter Engagement
We seek to study the attributes that disclose the holistic picture of the overall engagement rate of AI-related tweets. We believe that the engagement rate has a potential to provide us with the necessary information on the patterns of public interests and perceptions in AI. We measure the engagement by considering the favorites received by a tweet, replies to a tweet and mentions of a Twitter post. The favorites here are the number of likes received by a tweet posted by a user who belongs to either AIT or EAIT. Note that these favorites are different from the favorites we considered as the influence attribute in Section 3.1. We first compute Twitter engagement statistics that are shown in Table 5.
|Min (Max)||Median (Mean)|
|Retweets||0 (1041)||0 (1701)||0.0 (1.5)||0.0 (3.28)|
|Favorites||0 (1268)||0 (1914)||0.0 (1.46)||1.0 (4.98)|
|Mentions||0 (9)||0 (10)||0.0 (0.63)||0.0 (0.54)|
Tweets made by EAIT are more likely to be retweeted than favorited by the users on this platform. 71.93% of EAITs tweets are retweeted atleast once and 31.14% of tweets are favorited atleast once. The percentage of EAIT’s posts retweeted is significantly higher than the general dataset which is 11.99% as shown by the existing literature [Suh et al.2010]. Where as, average percentage of tweets posted by AIT retweeted (69.38%) atleast once is almost same as the average percentage of favorites (67.8%) received per tweet.
Tweets from AIT has 11.45% of tweets that contain atleast one user handle where as, EAIT has 67.57% of such tweets. This shows that experts are more likely to interact or engage in discussions with each other about AI on Twitter than AIT. Literature [Java et al.2007, Kwak et al.2010, Yang and Counts2010] considers retweeting as one of the features to measure information diffusion. Based on these results, tweets posted by EAIT diffuse faster (higher retweet rate) than the tweets posted by AIT.
5 Optimistic or Pessimistic
In this work we measure the emotion attributes – optimism and pessimism in terms of positive and negative emotions. Alongside we assess other emotion-related attributes like cognitive mechanisms, insights and social aspects. Tausczik et. al [Tausczik and Pennebaker2010] in their work introducing LIWC mention that the way people express emotion and the degree to which they express it can tell us how people are experiencing the world. Existing literature [Danescu-Niculescu-Mizil, Gamon, and Dumais2011, Tsur and Rappoport2012, Tumasjan et al.2010] states that LIWC is powerful in accurately identifying emotion in the usage of language. Considering this fact, we employ the psycho-linguistic tool LIWC to measure the emotionality expressed in the tweets.
Figure 4 reveals that users categorized as AITs are more positive (65% greater than negative) and optimistic towards AI and its related topics. The horizontal axis in this figure represents the value of a given emotion attribute and the frequency of that attribute is plotted on the corresponding vertical axis. The distribution in each plot are normalized and the sum of all the values in different buckets of emotion metric will sum up to 100%. These results concur with the recent literature [Fast and Horvitz2016, Gaines-Ross2016] on New York Times articles and interviews with individuals. This is a useful finding because prior work shows that Twitter is known for more emotionally negative [Manikonda, Meduri, and Kambhampati2016] posts. In other words, despite the general negative emotional content on Twitter, this subset of tweets focusing on artificial intelligence are more positive than being negative.
We conduct the similar emotion analysis on tweets posted by experts and it reveals similar findings as earlier (shown in Figure 5) but with relatively higher negativity and higher cognitive mechanisms compared to AIT. Cognitive mechanisms or complexity can be described as richer way of reasoning [Tausczik and Pennebaker2010]. When certain set of posts have high values of cognitive mechanisms, this means that this set of posts contain large percentage of technical content. The horizontal axis in Figure 5 represents the gravity of a given emotion and the vertical axis represents the number of posts with the corresponding value of the given emotion on horizontal axis. Compared to AIT, tweets made by EAIT have more negativity overall. However, positive emotion is four times as dominating as the negative emotion. When we compare the positive and negative emotions of the two categories of users, the results reveal that expert users (pos-index:3.25; neg-index: 0.60) are almost twice the percentage of being negative than the AIT (pos-index:0.82; neg-index: 0.248).
Alongside, we conduct a granular evaluation by comparing the metrics of emotion between the three sub-categories of expert users – students, academicians and industry professionals. The aggregated values shown in Table 6 suggest that academicians are relatively less positive and more social than the users from the other two categories when tweeting about AI. These results corroborate the results from Twitter engagement where experts are using relatively larger percentage of mentions in their tweets engaging in discussions with others. These results also show that students and industry professionals tweet relatively more insights about AI in their tweets than academicians. According to Tausczik et. al [Tausczik and Pennebaker2010], insight words suggest the active reassessment of a theory.
6 Topics heavily discussed by users on Twitter about AI
In Section 3.2, we have presented the analysis on the interests of users by crawling their timelines and extracting topics from their timelines. In order to better understand the public perceptions about AI, we extract topics from the AI-related tweets. In this section we focus only on the AI-related tweets posted by users from AIT and EAIT. To perform this, we first consider all the tweets posted by AIT and EAIT separately. We utilize a keyword-based approach that looks for specific AI-related vocabulary in any given tweet. As there is a possibility that some tweets might be retweeted by the same set of users in a given category, we pre-process the two sets of AI-related tweets from AIT and EAIT. Once we clean the data, we then combine these two datasets to identify the latent topics. We empirically chose 6 topics to avoid redundancy and then identify the percentage of tweets that are contributing to each topic from these two categories – AIT and EAIT.
In Table 7, we present the topics extracted from the AI-related tweets. These topics display that the largest percentage of tweets shared by AIT (37%) focus on the effects of automation on future. Where as, the largest percentage of tweets made by EAIT (25%) concentrates on the technical implementations of AI systems. This topic is followed by tweets focusing on conferences & talks related to AI. This could be explained due to the interests of the crowd that is considered as experts in this analysis. The emphasis on the applications of AI from industry are relatively equal among both AIT and EAIT. As expected, the results show that AIT focus on general news about AI and the myths associated with AI more than the expert users. Due to the partial alignment of these topics with the findings shown by Fast et. al [Fast and Horvitz2016], individuals have continued interests in the similar topics over years.
|1||Effects of automation on future||future, business, human, jobs, revolution, automation, experience, impact, change, improve|
|2||AI applications from Industry||humans, google, elon, robots, facebook, brain, deepmind, cars, selfdriving, bots|
|3||Technical aspects of building models||learning, data, deep, analytics, algorithms, python, models, cloud, model, training|
|4||Daily news||latest, daily, news, tech, assistant, cars, mobile, voice, robot, speech, alexa|
|5||Myths & rise of AI||data, myths, automation, language, internet, age, machines, rise, language, read|
|6||Conference News||learning, talk, join, workshop, conference, event, meetup, summit, talking, panel, session|
7 Co-occurring concepts
The questions we investigated until now provides valuable insights into whether and how individuals perceive the issues about AI advancements. However, we note that conceptual relationships could significantly quantify and measure the perceptions of individuals. Towards addressing this challenge, we employ the popular word2vec analysis to detect relationships between words that are frequently co-occurring. Word2Vec [Mikolov et al.2013]
is a popular two-layer neural network that is used to process text. It considers a text corpus as an input and generates feature vectors for words present in that corpus. Word2vec represents words in a higher-dimensional feature space and makes accurate predictions about the meaning of a word based on its past occurrences. These vectors can then be utilized to detect relationships between words which are highly accurate given enough data to learn these vectors.
To detect the relationships, we train the Word2Vec model separately on the 71915 and 72,153 AI-related tweets posted by EAIT and AIT respectively. As a processing step, we first remove stop words from the tweets and consider each tweet independently. We utilized the pre-existing lists from academia222https://www.cs.utexas.edu/users/novak/aivocab.html and industry333http://www.techrepublic.com/article/mini-glossary-ai-terms-you-should-know/ to manually compile the AI vocabulary of 61 words. Table 8 provides the top-10 words co-occurring with the four keywords related to AI – agents, robots, ethics and privacy
. These words in the table are sorted in the decreasing order of their co-occurrence probability.
The co-occurring patterns shown in Table 8 suggest that AIT and EAIT use terms strikingly different.
Agents – It shows that EAIT are focusing on the behavioral characteristics of intelligent agents by using the terms – inattention, intelligence, aggression, etc. AIT also use similar terms related to behavior but they also focus on the applications of these intelligent agents.
Robots – AIT are focusing on the physical aspects of robots and their design issues. However, EAIT associates words that describe the types of robots and their usability.
Ethics– AIT are in general concerned about ethics related to AI especially expressing through words like empathy, moral and privacy. However, from the words used by EAIT it shows that they relate ethics to different aspects of humanities.
privacy – AIT associates privacy with the drawbacks of AI systems – bias, discrimination, etc which sound more negative. However, EAIT focuses on the implementation aspects of AI systems which can maintain accountability and respect privacy.
Social media platforms are one of the primary channels of communication in the lives of individuals. These platforms are reshaping our ideas and the way we share those ideas. Given the increasing interest in AI from different communities, multiple debates are commencing to evaluate the benefits and drawbacks of AI to humans and society as a whole. This paper presents the findings from our investigation on public perceptions about AI using the AI-related posts shared on Twitter. Alongside, we performed a comparative analysis between how the posts made by AIT and EAIT are engaged. Some of the key findings from our analysis are:
Based on the user characterization analysis, it was revealed that users who post about AI on Twitter are predominantly from USA and Europe.
Tweets about AI are overall more positive compared to the general tweets posted on Twitter.
AI-related tweets posted by EAIT are more negative than the AIT.
The effects of automation are of predominant concern to the general AI tweeters than the experts.
The large percentage of tweets made by the experts are about the technical implementations of AI systems and news about conferences.
Tweets posted by students and industry professionals relatively provide more insights about AI than academicians.
Academicians are relatively less positive and more social than students and industry professionals when tweeting about AI.
Tweets posted by experts have higher diffusion than the tweets posted by general AI tweeters.
The co-occurring pattern mapping tells us that EAIT acknowledges the challenges in building intelligent systems that maintains accountability in terms of honoring ethics and privacy. The terms used by them also focus on the implementation aspects of such systems. Where as, the terms used by AIT refers to their concerns about the behavioral characteristics and drawbacks of AI systems.
We hope that our findings will benefit different organizations and communities who are debating about the benefits and threats of AI to our society. Some of the future directions include a longitudinal study across several years as well as multiple mediums of communication.
- [Danescu-Niculescu-Mizil, Gamon, and Dumais2011] Danescu-Niculescu-Mizil, C.; Gamon, M.; and Dumais, S. 2011. Mark my words!: Linguistic style accommodation in social media. In Proceedings of the 20th International Conference on World Wide Web.
- [Fast and Horvitz2016] Fast, E., and Horvitz, E. 2016. Long-term trends in the public perception of artificial intelligence. In AAAI.
- [Gaines-Ross2016] Gaines-Ross, L. 2016. What do people – not techies, not companies – think about artificial intelligence? In Harvard Business Review.
- [Java et al.2007] Java, A.; Song, X.; Finin, T.; and Tseng, B. 2007. Why we twitter: Understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, WebKDD/SNA-KDD ’07.
- [Kwak et al.2010] Kwak, H.; Lee, C.; Park, H.; and Moon, S. 2010. What is twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web, WWW ’10.
- [Kwak, Chun, and Moon2011] Kwak, H.; Chun, H.; and Moon, S. 2011. Fragile online relationship: A first look at unfollow dynamics in twitter. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI).
- [Kwak, Moon, and Lee2012] Kwak, H.; Moon, S.; and Lee, W. 2012. More of a receiver than a giver: Why do people unfollow in twitter?
- [Manikonda, Meduri, and Kambhampati2016] Manikonda, L.; Meduri, V. V.; and Kambhampati, S. 2016. Tweeting the mind and instagramming the heart: Exploring differentiated content sharing on social media. In International AAAI Conference on Web and Social Media.
- [Mikolov et al.2013] Mikolov, T.; Chen, K.; Corrado, G.; and Dean, J. 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781.
- [Naaman, Boase, and Lai2010] Naaman, M.; Boase, J.; and Lai, C.-H. 2010. Is it really about me?: Message content in social awareness streams. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, CSCW ’10.
- [Suh et al.2010] Suh, B.; Hong, L.; Pirolli, P.; and Chi, E. H. 2010. Want to be retweeted? large scale analytics on factors impacting retweet in twitter network. In Proceedings of the 2010 IEEE Second International Conference on Social Computing, SOCIALCOM ’10.
- [Tausczik and Pennebaker2010] Tausczik, Y. R., and Pennebaker, J. W. 2010. The psychological meaning of words: Liwc and computerized text analysis methods. Journal of Language and Social Psychology 29(1):24–54.
- [Tsur and Rappoport2012] Tsur, O., and Rappoport, A. 2012. What’s in a hashtag?: Content based prediction of the spread of ideas in microblogging communities. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining.
- [Tumasjan et al.2010] Tumasjan, A.; Sprenger, T.; Sandner, P.; and Welpe, I. 2010. Predicting elections with twitter: What 140 characters reveal about political sentiment.
- [Yang and Counts2010] Yang, J., and Counts, S. 2010. Predicting the speed, scale, and range of information diffusion in twitter. ICWSM 10:355–358.
- [Zhao et al.2011] Zhao, W. X.; Jiang, J.; Weng, J.; He, J.; Lim, E.-P.; Yan, H.; and Li, X. 2011. Comparing twitter and traditional media using topic models. In Proceedings of the 33rd European Conference on Advances in Information Retrieval, ECIR’11.