The term clickbait refers to a form of web content that employs writing formulas and linguistic techniques in headlines to trick readers into clicking links [1, 2], but does not deliver on promises 111https://www.wired.com/2015/12/psychology-of-clickbait/. Media scholars and pundits consistently show clickbait content in a bad light, but the industry based on this type of content has been rapidly growing and reaching more and more people across the world [3, 4]. Taboola, one of the key providers of clickbait content, claims 222https://www.taboola.com/press-release/taboola-crosses-one-billion-user-mark-second-only-facebook-world’s-largest-discovery to have doubled its monthly reach from million unique users to billion in a single year from March 2015. The growth of clickbait industry appears to have clear impact on the media ecosystem, as many traditional media organizations have started to use clickbait techniques to attract readers and generate revenue. However, media analysts suggest that news media risk losing readers’ trust and depleting brand value by using clickbait techniques that may boost advertising revenue only temporarily. According to a study performed by Facebook 333https://www.nytimes.com/2014/08/26/business/media/facebook-takes-steps-against-click-bait-articles.html, users “preferred headlines that helped them decide if they wanted to read the full article before they had to click through”.  shows that clickbait headlines lead to negative reactions among media users.
Compared to the reach of clickbait content and its impact on the online media ecosystem, the amount of research done on this topic is very small. No large scale study has been conducted to examine the extent to which different types of media use clickbait techniques. Little is known about the extent to which clickbait headlines contribute to user engagement on social networking platforms – major distributors of web content. This study seeks to fill this gap by examining uses of clickbait techniques in headlines by mainstream and unreliable media organizations on the social network. Some of the questions we answer in this paper are– (i) to what extent, mainstream and unreliable media organizations use clickbait? (ii) does the topic distribution of the contents vary in clickbaity contents? (iii) which type of headlines – clickbait or non-clickbait —- generates more user engagement (e.g., shares, comments, reactions)?
We first create a set of supervised clickbait classification models to identify clickbait headlines. Instead of following the traditional bag-of-words and hand-crafted feature set approaches, we take a more recent deep learning path that does not require feature engineering. Specifically, we use distributed subword embedding technique[6, 7] to transform the words in the corpus to accuracy on a labeled dataset. We use this model to analyze a larger dataset which is a collection of approximately million Facebook posts created during 2014–2016 by mainstream media and unreliable media organizations. In addition to identifying the clickbait headlines in the corpus, we also use the embeddings to measure the distance between the headline and the first paragraph, known as intro, of a news article. We use a word co-occurrence based topic model that learns topics by modeling word-word co-occurrences patterns (e.g., bi-terms) to understand the distribution of topics in the clickbait and non-clickbait contents of each media. Finally, using the data on Facebook reactions, comments, and shares, we analyzed the role clickbaits play in user engagement and information spread. The main contributions of this paper are–
•We collect a large data corpus of million Facebook posts by over U.S. based media organizations. Details of the corpus is explained in Section II. We make the corpus available to use for research purpose 444URL will be added after acceptance.
•We prepare distributed subword based embeddings for the words present in the corpus. In Section III, we provide a comparison between these word embeddings and the word2vec [8, 9] embeddings created from Google News dataset with respect to clickbait detection. We plan to make these embeddings publicly available upon acceptance of the paper.
•We perform detailed analysis of the clickbait practice in the social network from multiple perspectives. Section IV presents qualitative, quantitative and impact analysis of clickbait and non-clickbait contents.
We use two datasets in this paper. Below, we provide description of the datasets and explain the collection process.
11Headlines2Media Corpus: This dataset is curated by Chakraborty et al. . It contains headlines of news articles which appeared in ‘WikiNews’, ‘New York Times’, ‘The Guardian’, ‘The Hindu’, ‘BuzzFeed’, ‘Upworthy’, ‘ViralNova’, ‘Thatscoop’, ‘Scoopwhoop’, and ‘ViralStories’. 555https://github.com/bhargaviparanjape/clickbait/tree/master/dataset Each of these headlines is manually labeled either as a clickbait or a non-clickbait by at least three volunteers. There are clickbait headlines and non-clickbait headlines in this dataset. We used this labeled dataset to develop an automatic clickbait classification model (details in Section III). An earlier version of this dataset was used in [2, 10]. It had manually labeled headlines with an even distribution of clickbait and non-clickbait headlines.
21Headlines2Media Corpus: For large scale analysis, using Facebook Graph API 666https://developers.facebook.com/docs/graph-api, we accumulated all the Facebook posts created by a set of mainstream and unreliable media within January , 2014 – December , 2016. The mainstream set consists of the most circulated print media 777https://en.wikipedia.org/wiki/List_of_newspapers_in_the_United_States and the most-watched broadcast media 888www.indiewire.com/2016/12/cnn-fox-news-msnbc-nbc-ratings-2016-winners-losers-1201762864/ (according to Nielson rating ). The unreliable set is a collection of conspiracy, clickbait, satire and junk science based media organizations. The category of each unreliable media is cross-checked by two sources [12, 13]. Figure 2 shows the number of media organizations in each category in the dataset along with the percentage. Overall, we collected more than million Facebook posts. A Facebook post may contain a photo or a video or a link to an external source. In this paper, we limit ourselves to the link and video type posts only. This reduces the corpus size to million. For each post, we collect the headline (title of a video or headline of an article) and the status message. For a collection of link type posts, we also collected the bodies of the corresponding news articles. All these contents (headlines, messages, bodies) were used to train a domain specific word embeddings (details in Section III). We also gather the Facebook reaction (Like, Love, Haha, Wow, Sad, Angry) statistics of each post. Table I shows distribution of the corpus.
Iii Clickbait Detection
The key purpose of this study is to systematically quantify the extents to which traditional print and broadcast media as well as “alternative” media – often portrayed as unreliable – use clickbait properties in contents published on the web. The first step towards that goal is to identify clickbait and non-clickbait headlines.
Iii-a Problem Definition
We define the clickbait identification task as a supervised binary classification problem where the set of classes, . Formally, given , a set of all sentences, and a training set of labeled sentences , where , we want to learn a function such that , in other words, it maps sentences to . In the following sections, we describe modeling of the problem and compare performances of multiple learning techniques.
Iii-B Problem Modeling
In text classification, a traditional approach is to use bag-of-words (BOW) model to transform text into feature vectors before applying learning algorithms.  followed this approach and used BOW model along with a collection of hand-crafted rules to prepare the feature set. However, inspired by the recent success of deep learning methods in text classification, we use distributed subword embeddings as features instead of applying BOW model. Specifically, we use an extension of the continuous skip-gram model , which takes into account subword (substring of a word) information . We call this model as Skip-Gram. Below, we explain how Skip-Gramis used to generate word embeddings.
Given a large corpus , represented as a sequence of words, , the objective of the skip-gram model is to maximize the log-likelihood
where the context is the set of indices of words surrounding . In other words, given a word , the model wants to maximize the correct prediction of its context
. The probability of observing a context wordgiven is parametrized using the word vectors. The output of the model is an embedding for each word which captures semantic and contextual information of the word. Skip-Gramworks in a slightly different way. Rather than treating each word as a unit, it breaks down words into subwords and wants to correctly predict the context subwords of a given subword. This extension allows sharing the representations across words, thus allowing to learn reliable representation for rare words. Consider the following example.
“the quick brown fox jumped over the lazy dog”- take the word “quick” as an example. Assuming subword length as three, the subwords are- . Skip-Grammodel learns to predict , in the context given as the input.
Figure 3 shows the architecture of the Skip-Gram
model. Using neural network, the model learns the mapping between the output and the input. The weights to the hidden layer form the vector representations of the subwords. The embedding of a word is formed by the sum of the vector representations of its subwords. Formally, given a wordand its set of subwords , we can calculate the embedding of using the following equation-
where is the embedding of and is the vector representation of . Further details of the Skip-Grammodel can be found in .
Iii-B2 Pre-trained Vectors
Note that Skip-Gramdoes not require to learn the embeddings of words in corpus . It means that one can use the model on any large corpus of text to learn the word embeddings irrespective of whether the corpus is labeled or not. This technique of learning from large text corpus helps having richer word embeddings which capture a lot of semantic, conceptual and contextual information. We use the texts (headlines, messages, bodies) from 21Headlines2Media Corpus to learn word embeddings using this model. In Section III-C, we present comparison between our pre-trained vectors and word vectors which were trained on about 100 billion words  from the Google News dataset.
For a labeled sentence , we average the embeddings of words present in
to form the hidden representation of
. These sentence representations are used to train a linear classifier. Specifically, we use the softmax function to compute the probability distribution over the classes in.  describes the classification process in detail.
We use the 11Headlines2Media Corpus dataset to evaluate our classification model. Section II provides the description of the dataset. We perform 10-fold cross-validation to evaluate various methods with respect to accuracy, precision, recall, f-measure, area under the ROC curve (ROC-AUC) and Cohen’s . Table II shows performances of the methods. To avoid randomness effect, we perform each experiment times and present the average. There are in total seven methods. We categorize them based on the use of pre-trained vectors. Note that we report performances of Chakroborty et al.  and Anand et al.  in the table. We keep Anand et al. with the methods which use pre-trained vectors. Because Anand et al. used word embeddings trained on about 100 billion words from the Google News dataset using the Continuous Bag of Words architecture . Each word embedding has dimensions. Both of these works [2, 10] used a smaller and earlier version of the 11Headlines2Media Corpus dataset. Moreover, the training and test sets of the earlier dataset are not available. So, we could not compare our methods with them using the same test bed.
The Skip-Grammodel, even without pre-trained vectors, significantly outperforms the BOW based Chakroborty et al. It achieves a f-measure score of ( higher than Chakroborty et al.) and a score of . Powered with the pre-trained vectors, Skip-Gramperformed even better. We used the same word embeddings provided by  as well as our own 21Headlines2Media Corpus. Regarding the later, we experimented with three combinations- pre-trained vectors learned from the content headlines only, from headlines and messages, and from headline, bodies and messages. We set embedding size to dimensions while learning from these combinations. For the methods which were applied on the full 11Headlines2Media Corpus dataset, we highlight the top performance in each column. Skip-Gramalong with pre-trained vectors from headlines, bodies and messages performed the best among all the variations. We realize that the differences of the measure values among the methods are small. However, we understand that making a small improvement while working above the range, is significant.
21Headlines2Media Corpus has unique embeddings where Google News dataset provided billion embeddings. One interesting observation is, even though the size of our 21Headlines2Media Corpus is significantly smaller than the Google News dataset, it contributes more to the clickbait classification task. It can be rationalized as, the embeddings from 21Headlines2Media Corpus have more domain specific knowledge than the Google News dataset. We plan to extend this corpus with more Facebook posts and release it along with the pre-trained vectors for research purpose upon acceptance of the paper.
With this powerful clickbait classification model [Skip-Gram+(Headline+Body+Message)], we move forward and perform large scale study on the clickbait practice by a range of media on social network (Facebook).
|Without Pre-trained Vectors||*Chakroborty et al. ||0.95||0.90||0.93||0.93||NA||0.97|
|With Pre-trained Vectors||*Anand et al. ||0.984||0.978||0.982||0.982||NA||0.998|
|Skip-Gram+ (Headline + Message)||0.982||0.982||0.982||0.982||0.964||0.982|
|Skip-Gram+ (Headline + Body + Message)||0.983||0.983||0.983||0.983||0.965||0.983|
Their experiments were performed on a smaller and earlier version of the 11Headlines2Media Corpus dataset.
Iv Practice of using clickbait in Social Network
We analyze the clickbait practice in Facebook using the 21Headlines2Media Corpus from three perspectives.
Iv-a Quantitative Analysis
To understand the extent of clickbait practice by different media and their categories, we applied the clickbait detection model on their contents; particularly on the headline/title of the link/video type posts. From now onward, we will use the term headline to denote both the headline of a link content (article) and the title of a video content. Table III shows amounts of clickbaits and non-clickbaits in the headlines of mainstream and unreliable media. Out of posts by mainstream media, have clickbait headlines. In unreliable media, the ratio is ( clickbait headlines out of ). Based on these statistics, the percentage appears to be surprisingly high for the mainstream. We zoom into the categories of these two media to analyze the primary proponents of the clickbait practice. We find that between the two categories of mainstream media, broadcast uses clickbait of the times whereas print only uses . We further zoom in to understand the high percentage in the broadcast category. The 21Headlines2Media Corpus has broadcast media. We manually categorize them into news oriented broadcast media (e.g. CNN, NBC, etc.) and non-news (lifestyle, entertainment, sports, etc.) broadcast media (e.g. HGTV, E!, etc.). There are news oriented broadcast media and non-news broadcast media. We find that the ratio of clickbait and non-clickbait is in non-news type broadcast media whereas it is only (close to print media) in news oriented media. Figure 5
shows kernel density estimation of the clickbait percentage both for news and non-news broadcast media. It clearly shows the difference in clickbait practice in these two sub-categories. Most of the news type broadcast media has aboutclickbait contents. On the other hand, the percentage of clickbait for non-news type broadcast media has a wider range with peak at about . In case of unreliable media, unsurprisingly all the categories have high percentage of clickbaits in their headlines. In Figure 5, we show the percentage of clickbait in video and link type posts for each of the media categories. Satire is leading in both link and video type posts. Print and conspiracy media have the lowest clickbait practice among all the media categories in link and video type posts, respectively. Table V shows the top- clickbait proponents in each media category.
Topic distribution: To understand the topics in the clickbait and non-clickbait contents, we applied topic modeling on all the headlines of each category. One concern about applying the traditional topic modeling algorithms (e.g. Latent Dirichlet Allocation, Latent Semantic Analysis) on our corpus is, they focus on document-level word co-occurrence patterns to discover the topics of a document. So, they may struggle with the high word co-occurrence patterns sparsity which becomes a dominant factor in case of shorter context. That is why we use Biterm Topic Modeling (BTM)  which generates the topics by directly modeling the aggregated word co-occurrence patterns of a short document.
|: best, thing, day, new, 2015, cleveland, la, 2016, know, year||: new, san, la, jose, police, county, vega, get, bay, school|
|: trump, woman, donald, new, get, say, make, people, thing, know||: police, man, cleveland, new, killed, woman, la, shooting, shot, get|
|: trump, new, get, woman, donald, make, star, say, man, chicago||: news, trump, new, man, say, york, woman, hawaii, police, killed|
|: new, best, thing, year, get, kid, day, woman, make, trump||: trump, new, u, clinton, say, state, win, donald, take, world|
|: boston, trump, donald, new, say, make, clinton, woman, get, 2016||: boston, new, say, trump, sox, chronicle, win, red, get, state|
|Broadcast||: new, movie, star, make, swift, time, video, best, get, like||: police, man, new, found, woman, killed, arrested, say, shooting, death|
|: new, get, baby, kardashian, jenner, star, first, make, love, say||: trump, clinton, say, new, obama, u, gop, news, campaign, hillary,|
|: woman, episode, new, trump, man, black, get, video, full, girl||: new, u, say, police, found, killed, dead, nbc, year, dy|
|: trump, history, know, thing, donald, clinton, get, make, best, say||: win, new, say, game, first, get, team, player, take, back|
|: day, photo, national, way, best, like, food, dog, thing, geographic||: national, geographic, photo, new, shark, day, classic, fs1undisputed, home, found|
|Unreliable||: trump, hillary, donald, clinton, obama, get, make, say, one, news||: obama, eagle, muslim, police, say, gun, u, cop, man, patriot|
|: video, people, american, black, obama, muslim, u, america, cop, white||: trump, hillary, clinton, obama, new, say, campaign, news, donald, republican|
|: chick, trump, eagle, right, woman, hillary, say, get, people, make||: u, obama, video, war, isi, new, military, american, world, muslim|
|: man, people, thing, woman, make, year, like, get, way, new||: new, truth, obama, say, u, republican, police, broadcast, man, american|
|: day, reunionfather, human, food, way, health, thing, reason, life, make||: human, cancer, health, new, vaccine, u, study, food, found, world|
Table IV shows topics in clickbait and non-clickbait contents for each media category. Each topic is represented by a set of words. The words are ordered by their significance in the corresponding topic. The modeling indicates that clickbait headlines in print and broadcast media vary in tones and subject matters from their non-clickbait headlines to a great extent. Clickbait headlines in these media represent more personalized, sensationalized and entertaining topics, while non-clickbait headlines highlight topics of collective problems such as public policies and civic affairs. But this variation is not much evident in unreliable media that use clickbait headlines indiscriminately across all topics.
The model highlights some differences in clickbait topics between print and broadcast media. Most clickbait topics in print media, four out of five, are about U.S. President Donald Trump’s views on women. Each of these four topics include all of these four words: Trump, woman, make, new. A manual search shows that print news media often used clickbait techniques (e.g., question based headline) in stories about Trump and women. For instance, “Did Donald Trump really say those things?” was the headline of a Washington Post article dated July 25, 2016. The headline of a New York Times story from May 14, 2016, reads; “Crossing the Line: How Donald Trump Behaved With Women in Private.”
Most clickbait topics in broadcast media are about entertainment (e.g., Taylor Swift’s new music video; Kardashian’s new baby) and lifestyle (e.g., food and health). Two topics appeared to touch Donald Trump and his opponent Hillary Clinton. Clickbait topics in unreliable media, however, range from politics to lifestyle. At least three topics appeared to be about politics in which key words include, Trump, Hillary, Obama, Muslim, Cop, and Woman. One topic is about food and health while another is unclear.
Non-clickbait topics remain similar across all three media types, which primarily focus on law and order, and U.S. presidential election campaign. Twelve out of 15 topics – all five in print, three in broadcast, and four in unreliable – are about these two areas. One broadcast topic appears to be about sports and one is unclear. One unreliable topic is about food and health.
Headline-Body similarity: One limitation of Skip-Gramis, it only considers the headline to determine whether it is a clickbait or not. The body of the news, is not considered as a factor in defining the headline. An attractive headline can be highly relevant to the content/body of a news or it can be very loosely related to the news. Our model is not capable of making the distinction. A metric is required to measure the similarity between the headline and the content to determine if the headline fairly represents the content. In future, we want to systematically incorporate the headline-body similarity in defining the clickbaitiness. Nonetheless, here we measure how similar the clickbait and non-clickbait headlines are to the corresponding bodies using a simple approach. We assume that the first para of an article represents the summary of the whole news 
and use cosine similarity to measure the similarity between the headline and the sentences in the first para. We use bag-of-words model to transform the sentences into vectors before applying cosine similarity. In future, we plan to use our word embeddings to create the vectors instead. Figure7 shows the kernel density estimation of the headline-body similarity in clickbait and non-clickbait contents posted by different media. One observation is, in print media non-clickbait headlines are closer to their summary than clickbait headlines. In broadcast media, the difference is less clear and in unreliable media the difference is almost absent.
|New York Post||11977||13910||46.27|
|Dallas Morning News||3982||8232||32.6|
|Chicks on the Right||14185||4977||74.03|
|Media||Category||Clickbait Status||Non-clickbait Link||Clickbait Status (%)|
To measure the reachability and user engagement of clickbait and non-clickbait contents, we use Facebook reactions, comments and shares as metrices. Figure 8 shows number of comments, shares and reactions (summation of like, haha, wow, sad, angry, happy, love) of an average clickbait and non-clickbait post in each media category. Blue areas indicate that on average, a clickbait post (link or video) receives more attention (reactions/shares/comments) than a non-clickbait post. Green areas indicate the opposite. Clickbait contents receive more attention and reach to more users in general. One exception is the broadcast media.
We also analyze how often a news article is re-posted in Facebook. Figure 6 shows number of times a link is re-posted by a media. Each bar represents a news link. The height indicates how many times this link was posted in Facebook by the colored media category. We only consider the links which were re-posted at least 20 times. Compare to others, conspiracy media organizations repeat the same link more. This is observed both for clickbait and non-clickbait. Clickbait media seem to repeatedly posting same clickbait links more than others.
Other than headlines, the media organizations also practice using clickbait in the Facebook status message itself. Table VI shows use of clickbait status for non-clickbait articles by different media. A general observation is, the practice is there to allure the readers by giving clickbaity message posts even for non-clickbaity news contents. Unsurprisingly, the clickbait media category is leading in this practice.
V Related Work
Even though clickbait is a relatively nascent term, its traces can be found in several journalistic concepts such as tabloidization and content trivialization. The linguistic techniques and presentation styles, employed typically in clickbait headlines and articles, derived from the tabloid press that baits readers with sensational language and appealing topics such as celebrity gossip, humor, fear and sex . Clickbait articles are also similar to tabloid press articles in terms of story focus, which puts emphasis on the entertaining elements of an event rather than the informative elements. The Internet and especially the social media have made it easier for the clickbait practitioners to create, publish in a larger scale and reach to a broader audience with a higher speed than before . In the last several years, academicians and media studied this phenomenon from several perspectives.
Clickbait– Properties, Practice and Effects: There have been a small number of studies–some conducted by academic researchers and others by media firms–which examined correlations between headline attributes and degree of user engagement with content. Some media market analysts and commentators  discussed various aspects of this practice. However, no research has been found, which gauges the extents of clickbait practices by mainstream and alternative media outlets on the web. Nor have we found any study that examined if clickbait techniques help increase user engagement on social media.
A journalism professor  manually examined content of four online sections of the Spanish newspaper El Pais 999http://elpais.com, which apparently used clickbait features to capture attention. The corpus included only articles published in June, 2015. The articles in the corpus appeared to emphasize anecdotal aspects, or issues with little value, and curiosities. The study identified various linguistic techniques used in headlines of these articles such as orality markers and interaction (e.g., direct appeal to the reader), vocabulary and word games (e.g., informal language, generic or buzzwords), and morphosyntax (e.g., simple structures).
Researchers at the University of Texas’s Engaging News Project  conducted an experiment on U.S. adults to examine their reactions to clickbait (e.g., question-based headlines) and traditional news headlines in political articles. They found that clickbait headlines led to more negative reactions among users than non-clickbait headlines. Interestingly, the same users were slightly more engaged with non-traditional media that tend to use clickbait techniques more often. This finding questions the conventional belief that user reactions may predict user engagement, and warrants large-scale investigations.
Chartbeat, an analytics firm that provides market intelligence to media organizations, tested headlines from over 100 websites for their effectiveness in engaging users with content . The study examined ‘common tropes’ in headlines– a majority of them are considered clickbait techniques – and found that some of these tropes are more effective than others. Some media pundits interpreted the findings of this study as clickbaits being detrimental to traditional news brands.
HubSpot and Outbrain, two content marketing platforms that distribute clickbait contents across the web, examined millions of headlines to identify attributes that contribute to traffic growth, increased engagement, and conversion of readers into subscribers. The study suggested that clickbait techniques may increase temporary engagement , but an article must deliver on its promises made in headline for users to return and convert.
and uses Logistic Regression to create a supervised clickbait detection model. It assumes allBuzzfeed and Clickhole headlines as clickbait and all NYT headlines as non-clickbait. We would like to argue that it makes the model susceptible to personal bias as it overlooks the fact that many Buzzfeed contents are original, non-clickbaity and there are clickbait practice in NYT . Moreover, BuzzFeed, and NYT usually write on very different topics. The model might have been trained merely as a topic classifier.  attempts to detect clickbaity Tweets in Twitter by using common words occurring in clickbaits, and by extracting some tweet specific features.  uses a dataset of manually labeled headlines to train several supervised models for clickbait detection. These methods heavily depend on a rich set of hand-crafted features which take good amount of time to engineer and sometimes are specific to the domain (for example, tweet related features are specific to Twitter data and inapplicable to other domains). 
presents clickbait detection model which uses word embeddings and Recurrent Neural Network (RNN). These works consider the structure and semantic of a headline to determine whether it is a clickbait or not. However, one important aspect, the body of the news, is not considered as a factor in these works at all. We would like to argue that only the headline itself does not fully represent whether an article is a clickbait or not. If a headline represents the body fairly, it should not be considered as a clickbait. Consider the title as an example,‘The Top 10 Mistakes Of Entrepreneurs’101010www.forbes.com/sites/roberthof/2016/02/23/guy-kawasaki-the-top-10-mistakes-of-entrepreneurs. It is as clickbait of a headline as it can be. However, the body actually contains reasonably decent materials, which might be interesting to many people.
In this paper, we introduce a word-embedding based clickbait detection system which is built on our own collected corpus of news headlines and contents. We showed that our model performs better than the Google news dataset based embeddings. Our analysis also reveals how mainstream media are getting involved into clickbait practicing increasingly. Close scrutiny of the social media posts also reveals that broadcast type media has higher percentage of usage of clickbait practice than the print media and non-news type broadcast media mostly contributes to it. Our study also brings forth another fact of using higher percentage of clickbait practice by unreliable media which is quite obvious. Moreover, results from our topic modeling indicates that clickbait practice is prevalent in personalized and entertaining areas. In future, we want to incorporate the content of the news in defining the clickbaitiness of a headline. We believe, such a system would help social networking platforms to curb the problem of clickbait and provide a better using experience.
-  D. Palau-Sampio, “Reference press metamorphosis in the digital context: clickbait and tabloid strategies in elpais. com.” Communication & Society, vol. 29, no. 2, 2016.
-  A. Chakraborty, B. Paranjape, S. Kakarla, and N. Ganguly, “Stop clickbait: Detecting and preventing clickbaits in online news media,” in Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM International Conference on. IEEE, 2016, pp. 9–16.
-  M. S. LUCKIE, “Adele and the death of clickbait,” http://www.niemanlab.org/2015/12/adele-and-the-death-of-clickbait/, 2015.
-  C. Sutcliffe, “Can publishers step away from the brink of peak content?” https://www.themediabriefing.com/article/can-publishers-step-away-from-the-brink-of-peak-content, 2016.
-  J. M. Scacco and A. Muddiman, “Investigating the influence of “clickbait” news headlines,” https://engagingnewsproject.org/wp-content/uploads/2016/08/ENP-Investigating-the-Influence-of-Clickbait-News-Headlines.pdf, 2016.
-  P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” arXiv preprint arXiv:1607.04606, 2016.
-  A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” arXiv preprint arXiv:1607.01759, 2016.
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” inAdvances in neural information processing systems, 2013, pp. 3111–3119.
-  T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
-  A. Anand, T. Chakraborty, and N. Park, “We used neural networks to detect clickbaits: You won’t believe what happened next!” arXiv preprint arXiv:1612.01340, 2016.
-  J. Eggerton, “Fcc: Nielsen dmas still best definition of tv market,” Broadcasting & Cable, 2016.
-  informationisbeautiful.net, “Unreliable/fake news sites & sources,” https://docs.google.com/spreadsheets/d/1xDDmbr54qzzG8wUrRdxQl_C1dixJSIYqQUaXVZBqsJs, 2016.
-  M. Zimdars, “My ‘fake news list’ went viral. but made-up stories are only part of the problem,” https://www.washingtonpost.com/posteverything/wp/2016/11/18/my-fake-news-list-went-viral-but-made-up-stories-are-only-part-of-the-problem, 2016.
-  X. Yan, J. Guo, Y. Lan, and X. Cheng, “A biterm topic model for short texts,” in Proceedings of the 22nd international conference on World Wide Web. ACM, 2013, pp. 1445–1456.
-  R. S. Izard, H. M. Culbertson, and D. A. Lambert, Fundamentals of news reporting. Kendall/Hunt Publishing Company, 1994.
-  M. Ingram, “The internet didn’t invent viral content or clickbait journalism – there’s just more of it now, and it happens faster,” https://gigaom.com/2014/04/01/the-internet-didnt-invent-viral-content-or-clickbait-journalism-theres-just-more-of-it-now-and-it-happens-faster, 2014.
-  F. Filloux, “Clickbait is devouring journalism but there are ways out,” https://qz.com/648845/clickbait-is-devouring-journalism-but-there-are-ways-out/, 2016.
-  C. Breaux, “You’ll never guess how chartbeat’s data scientists came up with the single greatest headline,” http://blog.chartbeat.com/2015/11/20/youll-never-guess-how-chartbeats-data-scientists-came-up-with-the-single-greatest-headline, 2015.
-  Hubspot and Outbrain, “Data driven strategies for writing effective titles & headlines,” http://cdn2.hubspot.net/hub/53/file-2505556912-pdf/Data_Driven_Strategies_For_Writing_Effective_Titles_and_Headlines.pdf.
-  M. Potthast, S. Köpsel, B. Stein, and M. Hagen, “Clickbait detection,” in European Conference on Information Retrieval. Springer, 2016, pp. 810–817.
-  A. Thakur, “Identifying clickbaits using machine learning,” https://www.linkedin.com/pulse/identifying-clickbaits-using-machine-learning-abhishek-thakur, 2016.
-  N. Hurst, “To clickbait or not to clickbait? an examination of clickbait headline effects on source credibility,” Ph.D. dissertation, University of Missouri–Columbia, 2016.
-  L. Eidnes, “Auto-generating clickbait with recurrent neural networks,” https://larseidnes.com/2015/10/13/auto-generating-clickbait-with-recurrent-neural-networks, 2015.
-  C. Cha, “clickbait generator,” http://www.thisisreallyreal.com/, 2016.
-  “Linkbait title generator,” http://www.contentrow.com/tools/link-bait-title-generator.