(Im)balance in the Representation of News? An Extensive Study on a Decade Long Dataset from India

by   Souvic Chakraborty, et al.

(Im)balance in the representation of news has always been a topic of debate in political circles. The concept of balance has often been discussed and studied in the context of the social responsibility theory and the prestige press in the USA. While various qualitative, as well as quantitative measures of balance, have been suggested in the literature, a comprehensive analysis of all these measures across a large dataset of the post-truth era comprising different popular news media houses and over a sufficiently long temporal scale in a non-US democratic setting is lacking. We use this concept of balance to measure and understand the evolution of imbalance in Indian media on various journalistic metrics on a month-by-month basis. For this study, we amass a huge dataset of over four million political articles from India for 9+ years and analyze the extent and quality of coverage given to issues and political parties in the context of contemporary influential events for three leading newspapers. We use several state-of-the-art NLP tools to effectively understand political polarization (if any) manifesting in these articles over time. We find that two out of the three news outlets are more strongly clustered in their imbalance metrics. We also observe that only a few locations are extensively covered across all the news outlets and the situation is only slightly getting better for one of the three news outlets. Cloze tests show that the changing landscape of events get reflected in all the news outlets with border and terrorism issues dominating in around 2010 while economic aspects like unemployment, GST, demonetization, etc. became more dominant in the period 2014 – 2018. Further, cloze tests clearly portray the changing popularity profile of the political parties over time.



There are no comments yet.


page 1

page 8

page 9


The POLUSA Dataset: 0.9M Political News Articles Balanced by Time and Outlet Popularity

News articles covering policy issues are an essential source of informat...

Publishing patterns reflect political polarization in news media

Digital news outlets rely on a variety of outside contributors, from fre...

Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political Strategies

Amidst growing concern over media manipulation, NLP attention has focuse...

GenderedNews: Une approche computationnelle des écarts de représentation des genres dans la presse française

In this article, we present GenderedNews (<https://gendered-news.imag.fr...

"You are no Jack Kennedy": On Media Selection of Highlights from Presidential Debates

Political speeches and debates play an important role in shaping the ima...

Lévy Flights of the Collective Imagination

We present a structured random-walk model that captures key aspects of h...

Post-war Civil War Propaganda Techniques and Media Spins in Nigeria and Journalism Practice

In public relations and political communication, a spin is a form of pro...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Indian media has been the target of blatant criticism internationally for propagation of one-sided views on specific issues and deliberate introduction of imbalance and sensationalism in reporting news111https://www.aljazeera.com/opinions/2020/2/24/indias-media-is-failing-in-its-democratic-duty. Specific media houses have been consistent targets by the supporters of specific parties for propagating unbalanced views on issues benefiting some specific party. Many of these unverified news and systematic imbalances have been criticized to introduce communal disharmony and even loss of lives.

So, we intend to study how (im)balanced Indian news media is and how it has changed its course over time with different ruling parties in the centre and on face of large scale events including the national election in 2014 with almost a billion voters and significant change in share of seats for parties, large scale changes in monetary policy like demonetisation and introduction of GST in 2016 & 2017 and nation-wide non-political anti-corruption movement in 2011 with huge influence and mass following having significant impact on later political discourse.

While popularity of sharebaits is a problem, a bigger problem is perhaps the reliance of the common mass on social media to get their daily news feeds [32]. There is an increasing lack of diversity in the algorithmically driven news feeds that the social media typically presents to its readers. In many cases this leads to reinforcing the ‘bias’ among readers in the form of increased polarity of the political opinion of these readers over time [2]. This idea has been presented in the past literature in various forms like echo chambers [17, 16] and filter bubbles [44]. Had the nature or inclination of bias used by newspapers been dynamic temporally or topically, it would have been less likely to influence people at large. However, recent research [53] in bias propagation suggests that typically “static” forms of biased news are more likely to spread. So, once a specific bias is introduced by the major media houses systematically, it will act as a positive feedback loop catering to the “confirmation” of their readers and making it more difficult for the media houses to introduce a different stance contradicting the view of their readers which they themselves shaped and are identified with.

I-a Current studies and their limitations

Most of the above investigations relating to echo chambers, filter bubbles and confirmation bias deal with the end effect of introducing systematic bias in various forms. A crucial question is how this bias is introduced in the first place? Can the source of this bias be suitably quantified? We posit, in this paper, that the ‘imbalance’ in the representation of the news could be a potential source for this bias. This is motivated by the works of previous researchers on fairness in journalism who often consider bias as the opposite of accuracy, balance, and fairness  [13, 14, 27, 48, 50].

While ‘imbalance’ is a useful measure for quantification of such bias as demonstrated in the works of previous researchers on fairness in journalism who often consider bias as the opposite of accuracy, balance, and fairness  [13, 14, 27, 48, 50], imbalance can have many facets and must be examined from different angles unlike many influential works [27, 8] which examined news articles on variations of coverage bias.

Computational studies (apart from the manual studies done by independent journalists) in news media [4, 49, 28] done till now are at most times lacking in either the number of articles examined or the time span of examination or both. One of the prime limitations of these studies is that most of them have taken a piecemeal approach. In order to overcome that limitation, we examine our corpus under various lenses of bias with variety of tools in order to get a more nuanced view of the evolution of Indian media and society at large over the last decade.

I-B Objectives of this work

The primary objective of this paper is to examine a large body of news articles through various lenses of biases. While most of the news media datasets available are specific to US [23, 52, 42] or Europe [6, 9] or Austraila [25], they are limited to the specific regional scope in time/events, diversity of media groups etc. So, for this specific study, we mine Indian newspapers with highest readership numbers and online availability that spans over almost a decade. We make efforts to understand the temporal behavior of different forms of bias featuring across three news media outlets by analyzing this massive news dataset, in the Indian context. We summarize our contributions in the following section.

I-C Key contributions

In the following we summarize our specific contributions.

  • We collect news articles for three leading Indian newspapers for a huge span of 9+ years. In addition, for every article we also separately collect various metadata like the source of the article, the headline and the URL.

  • We use seven different metrics of (im)balance that are easy to compute as well as interpret, motivated by two completely orthogonal viewpoints generally upheld by researchers or termed important by media bias gatekeeping organizations.

  • We compute these metrics for (im)balance month-wise for the three leading newspapers and observe the temporal trends for each kind of metrics, cluster these time series and discuss the most interesting observations.

  • We use different NLP tools like word embedding association test and masked language modelling to answer several research questions in an India specific setting and discuss the implications of the results obtained. These investigations unfold various interesting trends in Indian political discussions over the years.

Ii Background

In this section we first present a brief review of fairness and balance in the journalism literature. This is followed by a narration of how we build up our work on these ideas.

(Im)balance in journalism: Fairness and balance in the press has been studied for a long time in the journalism literature. One of the early works pertains to how the prestige press is distinctly different from the media outlets with wide circulation [27]. The authors perform a small scale data-driven study to show that the prestige press presents a more balanced coverage of local stories compared to wide circulation media outlets. Fico et al [13] studied the newspaper coverage of US in Gulf war and find that wide circulation newspapers were more likely to favor anti-war advocates than smaller ones. Fico et al [14] develop a content based technique to study the newspaper coverage of controversial issues. They find that only 7% of the stories were evenly balanced and the coverage on the Gulf war issue was most imbalanced. Fico et al [11] study structural characteristics of newspaper stories on the 1996 US presidential election. One of the very important findings in this work is that event coverage was the biggest predictor of imbalanced story. Fico et al [12] study balance in election news coverage about 2006 US senate elections. They observe that women reporters provided more evenly balanced treatment of the candidate assertions. Carter et al [5] the authors report the structural balance in local television election coverage. Geri et al [15] the authors study the broadcast and cable network news coverage of the 2004 presidential election and find that broadcast networks were more balanced in their aggregate attention to the presidential candidates compared to the cable networks. In [45], the researchers studied Google searches to find partisan bias in widely used digital platforms. Morgan et al[33] studied bias in social media shares and of news items.On the other hand, Kulshrestha et al [26] studied social media bias for political searches. Finally, in  [44] the authors study remedy techniques to limit exposure to bias : strategies for promoting diverse exposure.

The present work: In most of the above works, (im)balance has largely been quantified in terms of coverage. However, we posit that (im)balance can manifest in many different ways. In the following we outline these assimilating concepts from various past literature.

Coverage imbalance: Coverage imbalance is the extent to which some specific entities/topics are covered in the articles by a specific media house. Coverage imbalance may originate from the inequality in the number of articles published with stories related to each major party or the amount of inkspace given to each party (even if the number of articles are equal, articles covering one party can be longer) or the amount of inkspace used to cover speeches of leaders of each political party.

For the first two metric of coverage imbalance, we take inspiration from the work of D’Alessio et al  [8] and Lacy et al  [27]. The equivalent of number of stories featuring a party is the number of headlines featuring that party, the ideal contender of measuring Gatekeeping bias [8]. On the other hand, we take the sentences in the content as proxy for a combined measure of fairness and balance in work of Lacy et al [27] to measure the amount of inkspace given to a particular party. For all the analysis we use words as the analog of inkspace here as words are the basic units of semantic information in online media.

Following the work of Lin et al[29], we introduce another measure of coverage imbalance, the point of view imbalance. The point of view from which a news story is reported matters as the editors have to choose selective viewpoints due to the constraint in space/number of words that can be used in a readably long article. While one newspaper may choose to quote the government sources, the same news story may be reported quoting the opponent leader. Thus, imbalance gets introduced if diverse viewpoints are not equally represented as discussed in the guidelines for fair reporting by FAIR222https://www.fair.org. We try to capture the sentences where speech of any person affiliated to a party is reported and calculate the number of words (semantic equivalent of inkspace) just like the previous cases of coverage imbalance.

Tonality imbalance: Choice of words matters in presentation of a news. Same piece of news can be presented with positive sentiments expressed toward a specific entity; likewise it can be presented in a way that would portray the specific entity in a negative light. The views of the editors are often politically biased as evident from the “opinion” columns of the newspapers and can get reflected in the news story they are covering also. So, it is important to check for tonal imbalance in the news-stories.

Different distributions of sentiments for different topics are often used [54] to detect bias and credibility of news sources. We can use similar measures to capture tonality imbalance. We do so in two possible ways here: by measuring the density of positive & negative sentiments in the text or by measuring the overall density of subjectivity of the text. We use average sentiment/subjectivity over all sentences, talking about any specific party, weighted by the number of words here as a quality of a long sentence should have proportionate presence in the metric of our choice to express the actual inkspace equivalency.

In addition to the introduction of different perspectives on imbalance as above, we also upscale our study in two other major directions. First, we perform the analysis on a huge dataset comprising three Indian news outlets leading to a total of 3.86 million articles. Second, rather than aggregate statistics, we present temporal characteristics of the imbalance which allows us to make various important and nuanced observations.

Iii Dataset

From the list of top news media by readership, published by the Indian Readership Survey (IRS) 2017 [7] compiled by the Media Research Users Council (MRUC), we collect the news articles data for three popular English language newspapers in India, namely, Times of India (TOI)333https://timesofindia.indiatimes.com/archive.cms, The Hindu444https://www.thehindu.com/archive/ and India Today555https://www.indiatoday.in/archives/, where an online archive is available. We create a date-wise repository of more than four million news articles spanning 9 years (2010–2018) of news data crawling through the archives.

The total number of articles retrieved for TOI, The Hindu and India Today are 1,899,745, 1,032,377 & 926,922, respectively. A brief description of the year wise statistics of the political articles across the three newspapers is noted in Table I.

TOI The Hindu India Today
2010 101903 74927 45297
2011 169183 75160 27135
2012 236548 86107 146634
2013 237372 88795 197139
2014 210859 106013 152260
2015 243473 204421 135898
2016 208861 192120 71155
2017 242334 117584 76800
2018 249212 87250 74604
TABLE I: Year-wise statistics of the collected data for the chosen time range for all three newspapers. Note that for India Today the pattern is unique in that from 2011 to 2012 the circulation became 5x while from 2015 to 2016 the circulation became 0.5x. None of the other outlets indicate such stark shifts.

Iii-a Pre-processing of the raw dataset

Our data consists of the headlines of the news stories, date of publication and the content.

Keyword based shortlisting of articles: Our entire study is done to depict the political imbalance of different media houses in representation of news related to two major political parties of India – ‘Bharatiya Janata Party’666https://en.wikipedia.org/wiki/Bharatiya_Janata_Party and ‘Indian National Congress’777https://en.wikipedia.org/wiki/Indian_National_Congress, more famous as ‘BJP’ & ‘Congress’. We present a keyword based analysis throughout all the metrics. The set of keywords chosen for BJP (is referred to as BJPkeywords from now on) consists of – ‘Bharatiya Janata Party’, ‘BJP’, ‘Akhil Bharatiya Vidyarthi Parishad’, ‘ABVP’, ‘National Democratic Alliance’, ‘NDA’ . The set of keywords chosen for Congress (is referred to as Congresskeywords from now on) consists of – ‘Congress’ (the most popular version of the full name of the party), ‘INC’, ‘National Students’ Union of India’, ‘NSUI’, ‘United Progressive Alliance’ and ‘UPA’.

Motivation for the choice of keywords: A natural question would be why we chose only the above keywords for our experiments. The choice is motivated by the additional set of experiments that we did to fix the above set of keywords. We first chose these keywords on the basis of two criteria. We manually checked the news articles to find the important indicators and through this qualitative analysis, we found that the discussion around the parties can typically be identified with the most popular version of their names including acronyms & full names and popular acronyms of coalition governments & student organizations. Next we included the names of the personalities related to the party thereafter and observed the results are completely swayed away by the coverage of influential figures like the names of the Prime Minister or all the other names holding important offices. We argue that the political parties are distinct from the personalities affiliated to those parties or the government formed by those parties. This justifies the exclusion of these entities from our seed set since they might correspond to issues related to (i) the functioning of the government and not the party in particular or (ii) the functioning of the personalities who might hold different portfolios within the government and might have their own charismatic presence on many issues and may have a different face than being a party member only while commenting on different issues. Thus we argue that we should limit the keywords to the most popular seed set chosen alone that includes party names, acronyms, names of student wings and name of the democratic alliances where any of the two parties are the most influential ones.

Iv Methodology

This section is laid out in three parts. In the first part we discuss the different metrics of imbalance. In the second part, we demonstrate how the different temporally varying metrics of imbalance can be summarised to reflect certain universal patterns. Finally, we outline a method to identify word associations that could reflect polarisation.

Iv-a Uniform metric of imbalance

We adopt the method of determining imbalance from the acclaimed work of Lacy et al[27]. For each of the metric , that we use in the subsequent analysis, (henceforth, ), we compute an imbalance score at the granularity of each month. Thus, for a particular month of a year, the imbalance score (henceforth, ) is calculated as follows, adapting the work of Lacy et al[27].


where and correspond to the aggregated documents for the two political parties considered (BJP and Congress, respectively, in our case). We detail a criterion to include the sentences of the contents/headlines in any/both of the two documents for each of the metrics. We also detail a criterion to determine the score for that document for each of the metric, which helps us compute the two scores in the equation above.

Next, we obtain the imbalance scores for each metric across the timeline of 2010–2018. Note that this imbalance score has a direction. All positive scores correspond to a leaning toward BJP and all the negative scores correspond to a leaning toward Congress. The absolute value of the score denotes the imbalance without direction. Apart from the timeline plots illustrating the imbalance with direction, we also compute the aggregate values of the absolute imbalance score for each metric averaged over the timeline of computation.

Iv-B Coverage imbalance

We have done two studies to find balance in coverage of political parties in newspapers, one on the basis of (i) the headlines and the other on the basis of (ii) the content of the article.


Sentence inclusion criterion: If one or more of the BJPkeywords introduced in the previous section are present in the headline of a news article, we include the headline in Bhyy-mm. Similarly, if one or more of the Congresskeywords introduced in the previous section are present in the headline of a news article then we include that headline in Chyy-mm. If the headline contains keywords from both the sets of keywords, it is included in both the documents.

Score of each document: Each of the headlines forms one entity of attention to the general populace. So, we simply count the number of headlines included for each of the document as score of that document.


Sentence inclusion criterion: If one or more of the BJPkeywords are present in a sentence of the content of a news article, we include that sentence in Bcyy-mm. Similarly, if one or more of the Congresskeywords are present in a sentence of the content of a news article, we include that sentence in Ccyy-mm. If the sentence contains keywords from both the sets of keywords, it is included in both the documents.

Score of each document: In contrast to the headlines, content is consumed by volume of words written. As words form the atomic unit of semantic expression, we use the number of words in the whole document as the score of that particular document in case of content metric.

Point of view imbalance

To understand which party’s point of view is presented, we turn to the narrative verbs like “say” and “tell”. Whereas, much research [36] has gone into quote attribution, we find that the majority of point of views presented in the newspaper is indirect speech in reported form. So, to account for both, we count the number of times a noun phrase, containing at least one keyword/keyphrase of BJP or Congress, is the subject of a sentence. Thus, we will be able to get the sentences where something said by BJP or Congress or some member of the party has been highlighted.

Sentence inclusion criterion: The sentence inclusion criterion is exactly similar to the one used for determination of content imbalance.

Score of each document

: We wish to get a rough statistical estimate of the number of words used to represent the point of view of each party by this score. So, as discussed above, we pick those sentences which contain any of the forms of the narrative verbs like “say” and “tell” as the main verb and also have a subject noun phrase containing the keywords/phrases related to the specific party of the document. We thus calculate the total number of words contained in the sentences picked from each document as the score of that document.

Iv-C Tonality imbalance

Here we discuss two different forms of imbalance metrics - (i) sentiment imbalance and (ii) subjectivity imbalance.

Sentiment imbalance

We have done sentiment analysis for the articles using the widely used VADER sentiment analyzer of NLTK


Sentence inclusion criterion: The sentence inclusion criterion is exactly similar to the one used for coverage imbalance in content.

Score of each document: Sentiment associations with keywords are studied here following Zhang et al [54]’s analysis of sentiment association with topics to determine imbalance/bias of a news-source. So, for each of the sentences in the BJP or Congress document, we determine the positive/negative sentiments expressed in that sentence using the VADER sentiment analyzer. Now, we get the average of the sentiments of these sentences weighted by the number of words of these sentences as the final score of the document. We use such weighting scheme to account for the semantic space as the sentiment is being expressed over the words for each sentence. So the density of the sentiments per word is a useful measure here.

(a) Dendogram of coverage imbalance (headline)timelines clustering.
(b) Dendogram of coverage imbalance (content) timelines clustering.
(c) Dendogram of point of view imbalance timelines clustering.
(d) Dendogram of positive sentiment imbalance timelines clustering.
(e) Dendogram of negative sentiment imbalance timelines clustering.
(f) Dendogram of subjectivity imbalance timelines clustering.
Fig. 1: Dendogram of different measures of imbalances across different newspapers
(a) Dendogram of Times of India
(b) Dendogram of India Today
(c) Dendogram of The Hindu
Fig. 2: Dendogram of different newspapers across different measures of imbalances.(Acronyms used and their full forms: cov_h: coverage imbalance in headings; cov: coverage imbalance in content; sup: superlative and comparative imbalance; pos: positive sentiment imbalance; neg: negative sentiment imbalance; subj: subjectivity imbalance)

Subjectivity imbalance

We compute various quantities here most of which pertain to some notion of subjectivity as per the standard literature. In particular, we compute subjectivity and uses of superlatives & comparatives in the text of the content using the TextBlob999https://github.com/sloria/textblob & NLTK101010https://www.nltk.org/ library.

Sentence inclusion criterion: For all the measures of subjectivity, we use the same sentence inclusion criterion as the one used for coverage imbalance in content.

Score of each document: Sentiment and subjectivity are very related concepts. So, we use the same rationale of scoring here as used in the section of sentiment imbalance. For all these subjectivity metrics, for each of the sentences in the document we determine the score of the sentences using the aforementioned tools. Now, we obtain the average of these scores across all the sentences weighted by the number of words of these sentences as the final score of the document for each metric using the same rationale as described in the previous section.

For superlatives and comparatives, we count the average percentage of superlatives and comparatives used in the sentences where any of the party is mentioned (based on the same keyword based filtering), as an alternative measure of subjectivity.

Iv-D Summary based on time series clustering

For each individual imbalance metric and each newspaper, one can obtain a time series of the directed scores spanning over 9+ years. While one can always look into each such time series data to make an inference, our idea was to look for universal characteristics of imbalance across the three news outlets. To this end, we cluster the time series of imbalance scores using the standard dynamic time warping (DTW) approach. We use hierarchical agglomerative clustering to understand the similarity among the newspapers based on their temporal imbalance characteristics. In addition, this clustering technique also summarizes which of the metrics have remained closer to each other over time.

Iv-E Summary based on aggregation of scores

We aggregate the documents of each of the two parties over the whole timeline to obtain two aggregate documents. Next we compute each of the above metrics using the equation 1, to obtain the aggregate imbalance score corresponding to each metric and each political party.

Iv-F Word embedding association test (WEAT)

In order to understand how the popularity of the political parties among the people of India has changed over time we calculate the year specific word embeddings on our corpus using methods used to measure semantic shift in words over time.

Previous research[1] suggests that frequently used words have the least shift of their meaning over time. Hence we use top 1000 words in our corpus (excluding the words BJP and Congress consciously) to align the word embeddings trained for different time periods. We train word2vec [31] with SGNS (skip-gram with negative sampling), to create embeddings for each of the year in our dataset. Let be the matrix of word embeddings learnt for year and for vocabulary . Following Hamilton et al [21], we jointly align the word embeddings while generation, using the top 10000 common words present across time periods and by optimizing:

For simplicity we assume . After alignment, we measure the WEAT score of the words BJP and Congress with the opposite set of words {good, honest, efficient, superior} and {bad, dishonest, inefficient, inferior} using the algorithm presented in [3].

The differential association of a word with word sets and is given by

where is the set of word embeddings, is the word embedding for the word .

Now, the WEAT score is calculated as


Here the word sets and are the keywords related to the political parties, as already discussed previously.

V Experiments and results

In this section we discuss the key experiments and then detail the corresponding results.

V-a Summary based on time series clustering

We have seven different imbalance metrics namely – headlines coverage, content coverage, point of view, positive sentiment, negative sentiment, subjectivity and superlatives/comparatives. For a given news outlet therefore we shall have seven corresponding time series each spanning over 9+ years. Since there are three major news outlets in our dataset we have 21 time series in all. We cluster these 21 time series using DTW as discussed in the previous section. As evident from the results presented in the form of dendograms in Figure 1, Times of India and India Today exhibit stronger clustering across almost all the imbalance measures pointing to an interesting universal characteristic. In order to delve deeper into the dynamics, we present in Figures 3(a) and 3(b) respectively, two representative time series plots of imbalance scores – content coverage imbalance and positive sentiment imbalance.

(a) Content coverage trends.
(b) Positive sentiment trends.
Fig. 3: Temporal variation of imbalance in coverage of content and positive sentiments in the news articles for the different media houses.

Content coverage imbalance: In Figure 3(a), we plot the directed imbalance scores of content for each of the media houses over time. The first noticeable trend is that the media houses have distinct relative bias very consistent over the timeline. The Hindu has been especially consistent in maintaining a 5-10% shift in coverage toward the Congress party than the other two news organizations. Consequent to relatively higher leaning of The Hindu in coverage of the Congress party, The Hindu always remained Congress leaning (below zero in the curve) unlike its two peers. The shift between TOI and India Today is not that apparent thus providing the empirical justification of the results obtained from the DTW clustering.

Positive sentiment imbalance: In Figure 3(b), we observe that The Hindu has an imbalance score higher than the zero mark throughout till 2017 showing higher density of positive sentiments toward BJP. This behavior is in drastic contrast with the other two news outlets thus providing the justification in support of the DTW results. Of course, the election year (2014, which also marked significant change in vote share and public sentiment) observes high positive sentiments in favor of BJP in general.

As a next step, we cluster the seven different time series corresponding to the respective imbalance metrics for each of the news outlets separately (see Figure 2). For all the news outlets we again observe a universal pattern whereby the tonality based imbalances are clustered more strongly exhibiting their distinctions with the coverage and point of view imbalances.

V-B Summary based on aggregation of scores

Table II shows the aggregate imbalance scores for each of the newspapers across all metric for a fair quantitative comparison. We can see that there is no clear winner or loser in terms of imbalances. TOI reports the highest degree of imbalance in case of three metrics and The Hindu & India Today show highest imbalance in case of two metrics each. One peculiar point to note here is that for The Hindu, the average positive and negative sentiments are both BJP leaning. One can argue that this is counterintuitive since the density of positive sentiments in favor of one party being high for a media house in one month should imply that the density of negative sentiments has to be low in favor of that party for that month. Although intuitive, this is not obvious as both positive and negative sentiments can be expressed highly about a party if more about that party is discussed in the inkspace. For instance, at multiple time points, both the positive and negative sentiment scores for The Hindu are much above the zero line unlike the other two media houses (data not shown for paucity of space).

TOI The Hindu India Today
CovHead 16.18 9.87 14.72
CovCon 4.49 11.72 2.50
PoV 78.31 70.14 62.45
PosSent 0.44 2.42 0.20
NegSent 2.02 1.05 0.54
Subj 0.28 0.57 1.04
SupComp 1.15 1.96 5.22
TABLE II: Aggregate absolute imbalance scores: An upward arrow at the left of any number denotes an imbalance toward BJP and vice versa. The highest absolute imbalance score for each metric has been highlighted in boldface.
Fig. 4: Temporal variation of popularity of each party for the India Today corpus. The trends are very similar for the other two news outlets.

Inverse of standard deviation (imbalance in coverage of all states together)

(b) Coverage (in %) of bottom 20% states
(c) Coverage (in %) of bottom 50% states
Fig. 5: Trends in imbalance in coverage of states over time.

V-C WEAT scores to determine party popularity

We calculate the differential association over time and plot that over the years. We use this distance as a proxy for popularity as portrayed by that particular news media. From Figure 4, it is evident that BJP gained popularity in news very fast post 2011, surpassing popularity of Congress in 2014, the year of legislative assembly election when incumbent BJP overthrew the ruling Congress government. We also see the popularity of Congress increasing again since 2016, the year of demonetization, that possibly had a strong impact on the economy of India and specially on the poorest ones of the country.111111https://www.bbc.com/news/world-asia-india-46400677

V-D Imbalance in portrayal of different states/cities

Non-Hindi speaking states and especially states from north-east India[18], Jammu & Kashmir have often alleged other parts of India of cultural exclusion121212https://towardfreedom.org/story/indian-medias-missing-margins/. We attempt to understand how much of those allegations are true and if the situation is changing over time.

V-D1 City level analysis

We prepare a list of 25 most populous cities in India according to the census report of 2011131313http://censusindia.gov.in/2011-Common/CensusData2011.html and measure how these cities are covered by the news articles. We take one entry of a city if the city is mentioned at least once in a news article. We thus calculate the share of each city for a specific media house. We illustrate this imbalance in the coverage of cities in Figure 6.

Fig. 6: Coverage of different cities by the three news outlets (only top five cities covered are shown to facilitate increased visibility).
(a) Times of India, all 9 years
(b) India Today, all 9 years
(c) The Hindu, all 9 years
(d) Times of India, 2018
(e) India Today, 2018
(f) The Hindu, 2018
(g) Times of India, 2018
(h) India Today, 2018
(i) The Hindu, 2018
Fig. 7: Number of articles mentioning each state for different newspapers across different time periods

Observation: We can see from Figure 6

that the distribution of cities covered by each of the newspapers is heavily skewed with the most frequently covered five cities corresponding to 60-70% of the articles. Delhi and Mumbai are the two most dominant cities on this list for all the three newspapers.

V-D2 State level analysis

Now we attempt to understand if the situation is similar in state level and if yes, then which states are covered poorly. We collect all the state names from the census report of 201113 and search for their occurrence across the corpus. We note the number of articles a specific state is mentioned in and plot the same. We do this experiment for all the three newspapers and for each newspapers, we once plot for only 2010, once only for 2018 to understand the evolving trend.

Observations: It is evident from Figure 7 that the allegations mentioned in the start of the section is true. The states of Jammu & Kashmir, states in the north-east and some non-Hindi speaking states like Orissa or Jharkhand are squarely ignored by all the three newspapers. Hindu seems to stand out in coverage of states from the other two newspapers covering mostly south Indian states. Looking at the maps comparatively from 2010 to 2018, it seems that the situation is improving and more states are getting covered by the national newspapers over time though equality among the states may be a long way.

V-D3 Is the situation getting better/worse?

To better understand the trend, we attempt to quantify homogeneity using three metrics. First we convert the counts of states to probabilities by dividing them with the total number of mentions of all cities. Next we define the first metric of homogeneity as the inverse of the standard deviation of the probability distribution (since standard deviation is an established measure of homogeneity). The next two measures take a more boxed approach trying to understand if the low priority states for that particular newspaper is getting higher attention over time. Here low priority states for the second metric is the bottom 20% states and for the third metric is the bottom 50% states.

Observations: From Figure 5, we observe that the intuitive conclusions drawn from the last analysis stands true for Times of India and India Today for all the metrics. For the Hindu in the first metric (i.e., inverse of standard deviation) we see no clear trend. However, in both the other metrics we note that the newspaper is clearly diversifying its coverage over the states.

Vi Further insights

In order to obtain further insights, we perform a cloze task [51], i.e., a task that requires completion of a sentence by correctly predicting the masked/hidden word. For instance, in the following cloze task – “Sun is a huge ball of

, “fire” is a likely completion for the missing word. Given a cloze test, well-known language models like RoBERTa 

[30], produce a sequence of tokens with their corresponding probabilities to fill the given blank in the input sentence. We train RoBERTa (initialized with RoBERTa-base [30]) for each of our newspapers for each year present in the corpus separately for 20000 iterations following the language model training procedure described in Khalidkar et al [22]. This results in a total different models. We use these models (representative/mouthpiece of each newspaper at different times of the 9 years in our corpus) to answer the following three questions – (a) can one track the changing priorities for India as depicted by each news media house? (b) how are these newspapers reporting popularity of one party over the other, for these 9 years? and (c) how are newspapers presenting perception about the Indian economy?

2010 2018 2010 2018 2010 2018
2018 in 2018 2018 2018 2018 2018
The main issue in India is mask unemployment, water, terrorism, jobs, farmers, corruption, employment, women, agriculture, reservation, fuel, caste, GST, development, food Kashmir, migration, terror, Afghanistan, security, India, democracy, Pakistan, elections, insecurity, peace, violence, inflation unemployment, corruption, employment, water, GST, development, jobs, reservation, agriculture, poverty, immigration, housing, governance, money, food Kashmir, terrorism, terror, prices, India, Pakistan, trade, Afghanistan, inflation, peace unemployment, corruption, poverty, education, GST, water, Aadhaar, malnutrition, pollution, agriculture, farmers, immigration, inequality, healthcare, democracy terrorism, Kashmir, terror, Afghanistan, security, Pakistan, trade, inflation, development
The economy of India is mask growing, strong, slowing, weak, stagnant, thriving, developing, intact, poor, healthy, deteriorating, flourishing, bleeding, backward, expanding crumbling, shrinking, dying, suffering, divided, rotting, exploding, changing, broken, paralyzed, declining, collapsing, reeling, weakening, fragile shrinking, crumbling, dead, collapsing, destroyed, slowing, falling, broken, bleeding, different, intact, deteriorating, poor growing, vibrant, stagnant, flourishing, struggling, strong, booming, recovering, healthy, thriving, weak, improving, stable, fragile, sound changing, suffering, shrinking, developing, huge, dying, broken, transforming, declining, great, poor, weak, stagnant, flourishing growing, robust, booming, slowing, improving, fragile, contracting, thriving, strong, evolving, stable, good, expanding, recovering, vibrant
TABLE III: Top tokens increasingly and decreasingly accepted as answer in 2018 for the cloze task (a) & (c).
(a) Times of India
(b) India Today
(c) The Hindu
Fig. 8: Popularity of BJP over Congress quantified from the results of Cloze test 4, plotted over the years

Vi-a Can one track changing priorities?

To understand the changing priorities of India as a nation over the last decade, we propose the following cloze task query – “The main issue in India is ”. We attempt to understand how RoBERTa’s answer changes for this specific query from 2010 to 2018. To this purpose, we take a union of top 50 tokens given as output for RoBERTa and RoBERTa. We then rank the top tokens which underwent maximum positive change from 2010 to 2018 as an answer to the cloze test (i.e. the tokens which are more accepted as answer in 2018 than in 2010 for the cloze test). We also rank the top tokens which underwent maximum negative change from 2010 to 2018 as an answer to the cloze test (i.e. the tokens which are less accepted as answer in 2018 than in 2010 for the cloze test). We show maximum of 15 such tokens in order of probability (higher to lower).

Analysis and observations: From Table III, we see a similar pattern reverberating across the news media houses. The focus of India in 2018 is more on economic issues like unemployment, jobs, corruption, poverty, GST, food and reservation and less on border issues like Kashmir, Pakistan, Afganistan and security. More basic demands like food, housing, water and agriculture are popping up in 2018. We showed these results to 9 Indians in verse with the events in Indian government. All of them unanimously agreed that these are due to the changing landscape of events affecting India from 2010 to 2018. The period 2008 – 2010 saw a lot of coordinated bombing and shooting attacks by terrorists on Mumbai, the economic capital of India resulting in mass killings and injuries. These issues mainly related to the India-Pakistan border conflicts emerge in the words popping up in the 2010 newspapers. Between 2014 – 2018, on the other hand, India saw various economic reforms in the form of introduction of GST, demonetization, stress on online transactions and implementation and linking of AADHAR (an unified database of citizens like social security number in US) with banking for continuation of banking services. All these together led to the increase of priority of economy related words in these news outlets.

Vi-B How are these newspapers reporting popularity?

We attempt to understand how popularity of one party over the other is reported in these newspapers and how they are similar or different from each other. We define voting preference toward a specific political party , {“BJP”,“Congress”}. as:




Further, we normalize these values to probabilities toward any of the two parties, arbitrarily selected to be BJP (plotting both is redundant as ) as


Vi-C How are these newspapers reporting economic prosperity?

We attempt to understand how media houses are reporting economic prosperity of India over time. Using the probe “The economy of India is mask ”, we report the most probable outputs in  Table III

Observations: We plot the probabilities in favor of “BJP” over time for each news media house in Figure 8. We observe that once again all the news media groups show a very similar pattern with the period 2012-2013, a year before the national election, to be the inflection point. The opposition ‘BJP’ could defeat the incumbent ‘Congress’ government with a large margin following gain in popularity in 2010-2011 largely due to corruption charges against the ‘Congress’ which resulted in nationwide protest and a very influential anti-corruption movement in the capital. We see the popularity of ‘BJP’ with respect to ‘Congress’ only rose in the years following the election which seems intuitive as ‘BJP’ won the 2019 election also with huge majority and increased vote share than 2014141414https://en.wikipedia.org/wiki/Results_of_the_2019_Indian_general_election. Also, an interesting observation is that a huge policy failure like demonetization which arguably influenced the fall of GDP due to extreme shrinkage of money in circulation151515https://www.theguardian.com/world/2018/aug/30/india-demonetisation-drive-fails-uncover-black-money and nationwide increase in economic inequality did not decrease the popularity of ‘BJP’ very significantly though a dip in popularity is observed in 2016 for all the news media houses. For cloze test (c), we see Hindu and India Today both reporting similarly about the economy with higher negative words for economy in 2018 which resonates with the ground truth of GDP growth rate for India but ToI interestingly reports the opposite trend.

Vii Discussion

Vii-a Is balance necessary?

Many might argue that publicizable material begets news always and there is no point in considering balance despite being recommended by the experts of media watchdog groups161616https://fair.org/about-fair/ ,171717https://www.aim.org/about/who-we-are/ and the prestige press [27]. In conformation with the viewpoint of the experts and the prestige press, we investigate the extent of imbalance here. In fact, this viewpoint is largely motivated by the study done by D’Alessio and Allen[8] since it is a standing evidence of such a balanced reporting environment. In a different country, in a different period which did not see the rise of social media and democratization of content creation, their studies confirm that balanced reporting (i.e., covering the newsmakers and the criticizers equally) is possible and was the norm for most of the newspapers in their specific setting.

Vii-B Bias: Then & Now

Lacy et al [27] showed that the prestige press was distinctly different from the other media outlets presenting a more balanced coverage of local stories compared to wide circulation media outlets. D’Alessio and Allen [8] in their meta-analysis considered 59 quantitative studies containing data concerned with partisan media bias in presidential election campaigns in the extended period of 1948–2000 but they found no significant bias in the newspapers or in the newsmagazines. Our study seems to contradict their findings([27] is contradicted because all three newspapers in our study are highly read and respected for journalism and [8] is contradicted as they found little bias, we found significant bias) albeit in a completely different time window. Further while we studied the online news media, D’Alessio and Allen considered the print media in their study. We argue that the advent of Internet and its widespread availability has fundamentally changed the nature of news media over time. Hence the diffference is organic and provides validation to the claims reported by the media watchdogs or political parties[24][46] [20] [37]

The framing and the epistemological biases discussed by the authors in [43] also find relevance in our work. For instance, the framing bias corresponds to the tonality bias that we analyzed here. Similarly, the epistemological bias has parallels to the readability bias which is more subtle and harder to observe.

Finally, there is a significant difference in the way we perceive the notion of fairness. Many might argue that publicizable material begets news always and there is no point in considering equality of coverage as a measure of fairness despite that being recommended by the experts of media watchdog groups[10] and the prestige press [27]. In conformation with this viewpoint of the experts and the prestige press, we develop the definition of the angular bias distance based on equality of importance. In fact, this viewpoint is also largely motivated by the study done by D’Alessio and Allen since it is a standing evidence of such a bias free reporting environment. In a different country, in a different period which did not see the rise of social media and democratization of content creation, their study confirms that bias-free reporting (i.e., covering the newsmakers and the criticizers equally) is possible and was the norm for most of the newspapers in their specific setting.

Vii-C Generalizability of the bias quantification framework

In this section we discuss the generalizibility of the studies that we performed in the previous sections.

Extending to other news media outlets: Although we have used three news outlets for our analysis based on their online availability, the metrics that we propose are generic and can be easily computed for any other media outlet subject to data availability. Owing to the absence of digital archives we could not analyze some of the major players like Hindustan Times, The Economic Times, The Telegraph etc. However we are putting efforts to gather this data either by contacting the media houses directly or through appropriate digitization of the print version through the help of one of the national libraries.

In fact since our metrics are very generic, this study can be easily extended to other countries subject to the online availability of English newspapers. We indeed plan to subset other countries from the Indian subcontinent including Sri Lanka, Bangladesh, Nepal and Burma that are socio-politically similar to India.

Extending to a multi-lingual setting: The eighth schedule to the Indian constitution lists as many as 22 scheduled languages. There are more than one daily newspapers published in each of these languages. Some of the major players[39] are Dainik Jagaran (Hindi), Malayala Manorama (Malayalam), Daily Thanthi (Tamil), Lokmat (Marathi) etc. No analysis of media outlets in India is complete unless the study is done in a multi-lingual setting. Although our metrics are generic they are not language agnostic and would require processing multi-lingual text. However, NLP tools for every individual Indian language are not widely available. We plan to incorporate in future some of the languages where there are a few state-of-the-art NLP tools already available, e.g., Hindi, Bengali, Tamil etc.

Extending to a multi-dimensional setting: As also noted earlier we have restricted our study to only two major national parties (BJP and Congress). However, as per the latest reports from the Election Commission of India, there are 1841 registered parties. Eight of these are national parties, 52 are state parties while the rest are unrecognized parties [38]

. The bias vector therefore can theoretically have 1841 dimensions. However, in practice an interesting extension of the current study would be to at least factor in the eight national parties into the bias vector. Similarly, another important extension would be to study the state parties separately using a 52 dimension bias vector. However, the state parties would be typically active in the states, so this analysis would be more meaningful if performed on the state newspapers (English as well as regional languages).

Extending to a location specific setting: All our analysis presented in the paper has been considering India as an individual geographic unit. However, we have already pointed out that there are differences in the number of articles mentioning different parts of India (location bias). One can therefore easily extend this study to factor in the location information present in the article. However, data for many of the locations would be extremely sparse; this study can possibly be done for some of the highly covered states only.

Vii-D In search for mitigation of bias

Our main objective in this paper was to introduce and quantify the different forms of bias that one is able to observe across the Indian news media outlets. In this section we shall try to outline some of the mechanisms that can be used to mitigate (at least partially) such biases.

Making bias transparent: This approach would envisage to make the user aware that he/she is consuming a biased news through various visualization techniques implemented on the online newspaper platforms making a topic-wise comparison between various media outlets. This might also include simple indicators like how much factual a news is or what part of the same news-story is a news item covering with what sentiment. Such a nudging practice is widely prevalent in the literature with objectives to deliver multiple aspects of news in social media [41] or for encouraging users to read about diverse political opinions [35, 34].

Platform governance: With the exponential penetration of the social media, the way users consume news has seen a sea of change. Most users active on different social media platforms now consume their daily news from the news stories that are recommended by these platforms[47]. With the explosion in the Indian smartphone market the number of such users is increasing in leaps and bounds. Such platforms can easily game the users to consume only biased political news [40]. In February 2019 the United Kingdom’s Digital, Media, Culture, and Sport (DCMS) committee issued a verdict in view of this rising problem. The verdict said that social media platforms can no longer hide themselves behind the claim that they are merely a ‘platform’ and therefore have no responsibility of regulating the content they recommend to their users [19]. In similar lines, the European Union has now issued the ‘EU Code of Conduct on Terror and Hate Content’ (CoT) that applies to the entire EU region. EU, recently, has also deployed mechanisms to combat biased and fake news in the online world by constituting working groups that include voices from different avenues including academia, industry and civil society. For instance, in January 2018, 39 experts met to frame the ‘Code of Practice on Online Disinformation’ which was signed by tech giants like Facebook, Google etc. We believe that Social System Researchers have a lead role to play in such committees and any code of conduct cannot materialize unless the effect of the algorithmic implementation of policies of these platforms are reexamined empirically.

The road ahead. Our main objective in this paper was to introduce and quantify the different forms of imbalance metrics that one is able to observe across the Indian news media outlets. This will subsequently help in better platform governance through informing readers about the extent of different kinds of imbalances present in an article they are consuming and by showing similar articles on same topic highlighting the opposite viewpoints in the recommended list helping to create a better inclusive worldview for the readers bursting the filter bubble of biased news consumption. In view of the recent resurgence of debates and legal actions around platform governance [47, 19], this, we believe, is a very significant step forward.

Viii Conclusion & future work

In this paper we formulated and characterized imbalance in media through detailed analysis and discussion mostly in the context of political news. We empirically show the temporal relationship among the news outlets, the changing landscape of events featuring in them over time and the popularity trends of the political parties.

In future we would like to extend this work in multiple directions. One immediate task would be to see if we can study a wider range of media houses across different Indian languages and observe if we get similar results. Another immediate task would be to study evolution of religious and community-wise polarization in Indian society. Finally, we would also like to contribute to mitigation of such imbalances through platform governance.


  • [1] H. Azarbonyad, M. Dehghani, K. Beelen, A. Arkut, M. Marx, and J. Kamps (2017) Words are malleable: computing semantic shifts in political and media discourse. In CIKM, Cited by: §IV-F.
  • [2] E. Bakshy, S. Messing, and L. A. Adamic (2015) Exposure to ideologically diverse news and opinion on facebook. Science 348, pp. 1130–1132. Cited by: §I.
  • [3] M. Brunet, C. Alkalay-Houlihan, A. Anderson, and R. Zemel (2019) Understanding the origins of bias in word embeddings. In ICML, pp. 803–811. Cited by: §IV-F.
  • [4] C. Budak, S. Goel, and J. M. Rao (2014) Fair and balanced? quantifying media bias through crowdsourced content analysis. SSRN Electronic Journal. Cited by: §I-A.
  • [5] S. Carter, F. Fico, and J. A. McCabe (2002) Partisan and structural balance in local television election coverage. JMCQ 79, pp. 41–53. Cited by: §II.
  • [6] D. Corney, D. Albakour, M. Martinez, and S. Moussa (2016) What do a million news articles look like?. In Proceedings of the First International Workshop on Recent Trends in News Information Retrieval co-located with 38th European Conference on Information Retrieval (ECIR 2016), Padua, Italy, March 20, 2016., pp. 42–47. External Links: Link Cited by: §I-B.
  • [7] M. R. U. Council (2017) Indian readership survey: 2017. Cited by: §III.
  • [8] D. D’Alessio and M. Allen (2000) Media bias in presidential elections: a meta-analysis. Journ. of Comm. 50 (4). Cited by: §I-A, §II, §VII-A, §VII-B.
  • [9] J. Eberl (2018-03) Lying press: three levels of perceived media bias and their relationship with political preferences. Communications, pp. . External Links: Document Cited by: §I-B.
  • [10] FAIR (2020) FAIRNESS & accuracy in reporting. Note: https://fair.org/about-fair/ Cited by: §VII-B.
  • [11] Fico and W. Cote (1999) Fairness and balance in the structural characteristics of newspaper stories on the 1996 presidential election. JMCQ 76, pp. 124–137. Cited by: §II.
  • [12] F. Fico and Freedman (2008) Biasing influences on balance in election news coverage: an assessment of newspaper coverage of the 2006 u.s. senate elections. JMCQ. Cited by: §II.
  • [13] F. Fico, L. Ku, and S. Soffin (1994) Fairness, balance of newspaper coverage of u.s. in gulf war. NRJ 15. Cited by: §I-A, §I-A, §II.
  • [14] F. Fico and S. Soffin (1995) Fairness and balance of selected newspaper coverage of controversial national, state, and local issues. JMCQ 72, pp. 621–633. Cited by: §I-A, §I-A, §II.
  • [15] F. Fico, G. A. Zeldes, S. Carpenter, and A. Diddi (2008) Broadcast and cable network news coverage of the 2004 presidential election: an assessment of partisan and structural imbalance. Mass Comm. and Soc. 11 (3), pp. 319–339. Cited by: §II.
  • [16] S. Flaxman, S. Goel, and J. M. Rao (2016) Echo chambers online?: politically motivated selective exposure among internet news users. Public opinion quarterly 80, pp. 298–320. Cited by: §I.
  • [17] R. K. Garrett (2009) Echo chambers online?: politically motivated selective exposure among internet news users. J. of Comp.-Med. Comm. 14, pp. 265–285. Cited by: §I.
  • [18] N. A. Giri (2015) Content analysis of media coverage of north east india. Mass Communicator: International Journal of Communication Studies 9 (1), pp. 4–8. Cited by: §V-D.
  • [19] R. Gorwa (2019) The platform governance triangle: conceptualising the informal regulation of online content. Note: https://tinyurl.com/yb6hsys5 Cited by: §VII-D, §VII-D.
  • [20] R. Greenslade (2011) India’s dodgy ’paid news’ phenomenon. Note: https://tinyurl.com/yd2l9dlh Cited by: §VII-B.
  • [21] W. L. Hamilton, J. Leskovec, and D. Jurafsky (2016) Diachronic word embeddings reveal statistical laws of semantic change. In ACL, Cited by: §IV-F.
  • [22] K. Khadilkar, A. R. KhudaBukhsh, and T. M. Mitchell (2021) Gender bias, social bias and representation: 70 years of bollywood. arXiv preprint arXiv:2102.09103. Cited by: §VI.
  • [23] J. Kiesel, M. Mestre, R. Shukla, E. Vincent, D. Corney, P. Adineh, B. Stein, and M. Potthast (2018-11) Data for PAN at SemEval 2019 Task 4: Hyperpartisan News Detection. External Links: Document, Link Cited by: §I-B.
  • [24] M. Krishnan (2017) Indian media is facing a crisis of credibility. Note: https://tinyurl.com/ybkcsh9v Cited by: §VII-B.
  • [25] R. Kulkarni (2018) A Million News Headlines. External Links: Document, Link Cited by: §I-B.
  • [26] J. Kulshrestha, M. Eslami, J. Messias, M. B. Zafar, S. Ghosh, K. P. Gummadi, and K. Karahalios (2017) Quantifying search bias: investigating sources of bias for political searches in social media. In Proceedings of CSCW, pp. 417–432. Cited by: §II.
  • [27] S. Lacy, F. Fico, and T. F. Simon (1991) Fairness and balance in the prestige press. JMCQ 68, pp. 363–370. Cited by: §I-A, §I-A, §II, §II, §IV-A, §VII-A, §VII-B, §VII-B.
  • [28] D. M. Lazer, M. A. Baum, Y. Benkler, A. J. Berinsky, K. M. Greenhill, F. Menczer, M. J. Metzger, B. Nyhan, G. Pennycook, D. Rothschild, et al. (2018) The science of fake news. Science 359 (6380), pp. 1094–1096. Cited by: §I-A.
  • [29] W. Lin, T. Wilson, J. Wiebe, and A. Hauptmann (2006) Which side are you on? identifying perspectives at the document and sentence levels. In Proceedings of (CoNLL-X), Cited by: §II.
  • [30] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. Cited by: §VI.
  • [31] T. Mikolov, Chen, Corrado, and Dean (2013) Efficient estimation of word representations in vector space. In ICLR, Cited by: §IV-F.
  • [32] A. Mitchell, J. Gottfried, and K. E. Matsa (2015) Millennials and political news: social media – the local tv for the next generation?. Pew Research Center Survey. External Links: Link Cited by: §I.
  • [33] J. S. Morgan, C. Lampe, and M. Z. Shafiq (2013) Is news sharing on twitter ideologically biased?. In Proceedings of CSCW, pp. 887–896. Cited by: §II.
  • [34] S. A. Munson, S. Y. Lee, and P. Resnick (2013) Encouraging reading of diverse political viewpoints with a browser widget. In Proceedings of ICWSM, Cited by: §VII-D.
  • [35] S. A. Munson and P. Resnick (2010) Presenting diverse political opinions: how and how much. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’10, pp. 1457–1466. Cited by: §VII-D.
  • [36] G. Muzny, M. Fang, A. Chang, and D. Jurafsky (2017) A two-stage sieve approach for quote attribution. In EACL, Cited by: §IV-B.
  • [37] S. Ninan (2019) How india’s media landscape changed over five years. Note: https://tinyurl.com/ybe26fmq Cited by: §VII-B.
  • [38] E. C. of India (2019) List of political parties & symbol main notification. Note: https://eci.gov.in/files/file/9438-list-of-political-parties-symbol-main-notification-dated-15032019/ Cited by: §VII-C.
  • [39] M. of India (2019) Top newspaper brands in india. Note: https://business.mapsofindia.com/top-brands-india/top-newspaper-brands-in-india.html Cited by: §VII-C.
  • [40] W. Oremus (2016) Of course facebook is biased. Note: https://tinyurl.com/y8zq9nqz Cited by: §VII-D.
  • [41] S. Park, S. Kang, S. Chung, and J. Song (2009) NewsCube: delivering multiple aspects of news to mitigate media bias. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’09, pp. 443–452. Cited by: §VII-D.
  • [42] ProQuest (2019) ProQuest Historical Newspapers. External Links: Link Cited by: §I-B.
  • [43] M. Recasens, C. Danescu-Niculescu-Mizil, and D. Jurafsky (2013) Linguistic models for analyzing and detecting biased language. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1650–1659. Cited by: §VII-B.
  • [44] P. Resnick, R. K. Garrett, T. Kriplean, S. A. Munson, and N. J. Stroud (2013) Bursting your (filter) bubble: strategies for promoting diverse exposure. In Proceedings of CSCW companion, pp. 95–100. Cited by: §I, §II.
  • [45] R. E. Robertson, S. Jiang, K. Joseph, L. Friedland, D. Lazer, and C. Wilson (2018) Auditing partisan audience bias within google search. Proc. ACM Hum.-Comput. Interact. 2 (CSCW), pp. 148:1–148:22. Cited by: §II.
  • [46] G. Sarkar (2018) How ’india against biased media’ wants to teach journalists a lesson. Note: https://tinyurl.com/yagvy8r3 Cited by: §VII-B.
  • [47] E. Shearer and K. E. Matsa (2018) News use across social media platforms 2018. Note: https://tinyurl.com/y4awgo2p Cited by: §VII-D, §VII-D.
  • [48] T. F. Simon, F. Fico, and S. Lacy (1989) Covering conflict and controversy: measuring balance, fairness, defamation. Journalism Quarterly. Cited by: §I-A, §I-A.
  • [49] B. Spillane, S. Lawless, and V. Wade (2017) Perception of bias: the impact of user characteristics, website design and technical features. In WI, Cited by: §I-A.
  • [50] R. Streckfuss (1990) Objectivity in journalism: a search and a reassessment. Journalism Quarterly. Cited by: §I-A, §I-A.
  • [51] W. L. Taylor (1953) “Cloze procedure”: a new tool for measuring readability. Journalism Quarterly. Cited by: §VI.
  • [52] A. Thompson (2018) All the News. External Links: Link Cited by: §I-B.
  • [53] S. Vosoughi, D. Roy, and S. Aral (2018) The spread of true and false news online. Science 359 (6380), pp. 1146–1151. Cited by: §I.
  • [54] J. Zhang, Y. Kawai, S. Nakajima, Y. Matsumoto, and K. Tanaka (2011) Sentiment bias detection in support of news credibility judgment. In HICSS, Cited by: §II, §IV-C.

Ix Biography Section

Souvic Chakraborty is a PhD student in the Dept of Computer Science & Engineering at Indian Institute of Technology, Kharagpur.

Pawan Goyal is an Associate Professor in the Dept of Computer Science & Engineering at Indian Institute of Technology, Kharagpur.

Animesh Mukherjee is an Associate Professor (A K Singh Chair) in the Dept of Computer Science & Engineering at Indian Institute of Technology, Kharagpur.