Analyzing Who and What Appears in a Decade of US Cable TV News

08/13/2020 ∙ by James Hong, et al. ∙ 0

Cable TV news reaches millions of U.S. households each day, meaning that decisions about who appears on the news and what stories get covered can profoundly influence public opinion and discourse. We analyze a data set of nearly 24/7 video, audio, and text captions from three U.S. cable TV networks (CNN, FOX, and MSNBC) from January 2010 to July 2019. Using machine learning tools, we detect faces in 244,038 hours of video, label each face's presented gender, identify prominent public figures, and align text captions to audio. We use these labels to perform screen time and word frequency analyses. For example, we find that overall, much more screen time is given to male-presenting individuals than to female-presenting individuals (2.4x in 2010 and 1.9x in 2019). We present an interactive web-based tool, accessible at https://tvnews.stanford.edu, that allows the general public to perform their own analyses on the full cable TV news data set.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

page 12

page 13

page 16

page 18

page 19

page 21

page 26

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Cable TV news reaches millions of U.S. households each day, and profoundly influences public opinion and discourse on current events (tvnewsPew). While cable TV news has been on air for over 40 years, there has been little longitudinal analysis of its visual aspects. As a result, we have little understanding of who appears on cable TV news and what these individuals talk about.

Consider questions like, What is the screen time of men vs. women? Which political candidates and news presenters receive the most screen time? How are victims and perpetrators of violence portrayed? Which foreign countries are discussed the most? Who is on screen when different topics are discussed?

Figure 2. The percentage of time when at least one face appears on screen has increased on all three channels over the decade (thick lines), with most of the increase occurring between 2015 and 2018. The amount of time when multiple faces are on screen has also increased on all three channels, however the percentage of time with only one face on screen has declined on CNN and FOX, and stagnated on MSNBC.

In this paper, we demonstrate that it is possible to answer such questions by analyzing a data set comprised of nearly 24/7 coverage of video, audio and text captions from three major U.S. cable TV news channels – CNN, FOX (News) and MSNBC – over the last decade (January 1, 2010 to July 23, 2019). The data set was collected by the Internet Archive’s TV News Archive (tvnewsarchive) and in total includes 244,038 hours (equivalent to about 27.8 years) of footage. Using automated machine learning tools, we label the data set – e.g., we detect faces, label their presented gender, identify prominent public figures, and align text captions to audio. These labels allow us to detect 68,179 hours of commercials (27.9% of the video), leaving 175,858 hours (72.1%) of news programming (Figure 1a). In this paper, we focus on analyzing the news programming part of this data set.

Each of the resulting labels has a temporal extent, and we use these extents to compute the screen time

of faces and identify when faces are on screen and when words are said. We show that, by analyzing the screen time of faces, counting words in captions, and presenting results in the form of time-series plots, we can reveal a variety of insights, patterns, and trends about the data. To this end, we adopt an approach similar to the Google N-gram viewer 

(ngrams), which demonstrated the usefulness of word frequency analysis of 5.2 million books and print media from 1800 to 2000 to many disciplines, as well as to the GDELT AI Television Explorer (gdelt_ai_explorer) which enables analysis of cable TV news captions and on screen objects (but not people). The goal of our work is to enable similar analyses of cable TV news video using labels that aid understanding of who is on screen and what is in the captions.

Our work makes two main contributions.

  • [noitemsep,topsep=0pt]

  • We demonstrate that analyzing a decade of cable TV news video generates a variety of insights on a range of socially relevant issues, including gender balance (section 2), visual bias (section 3), topic coverage (section 4) and new presentation (section 5). The details of our complete data processing and labeling pipeline used for these analyses are described in Supplemental 1.

  • We present an interactive, web-based data analysis interface, akin to the Google N-gram viewer and GDELT AI Television Explorer, that allows users to easily formulate their own their analysis queries on our annotated data set of cable TV news (section 7). Our analysis interface is publicly accessible at https://tvnews.stanford.edu

    and it updates daily with new cable TV news video. Our data processing code will be made available as open source.

2. Who is in the news?

People are an integral part of the news stories that are covered, how they are told, and who tells them. We analyze the screen time and demographics of faces in U.S. cable TV news.

How much time is there at least one face on screen? We detect faces using the MTCNN (mtcnn) face detector on frames sampled every three seconds (Supplemental 1.3). Face detections span a wide range of visual contexts ranging from in-studio presenters/guests, people in B-roll footage, or static infographics. Overall, we detect 263M total faces, and at least one face appears on screen 75.3% of the time. The percentage of time with a face on screen has risen steadily from 72.9% in 2010 to 81.5% in 2019, and is similar across all three channels (Figure 2).

We also observe an increase in the average number of faces on screen. On CNN and FOX the amount of time when only one face is on screen has declined, while it has remained constant on MSNBC. On all three channels, the amount of time when multiple faces (2 or more) are on screen simultaneously has risen. This accounts for the overall increase in time when at least one face is on screen, though we do not analyze which types of content (with no faces on screen) that this footage is replacing. We note that while the average number of faces has increased in news content, the average number of faces on screen in commercials has remained flat since 2013 (Supplemental 2.1.1).

How does screen time of male-presenting individuals compare to female-presenting individuals?

We estimate the presented binary gender of each detected face using a nearest neighbor classifier trained on FaceNet 

(facenet) descriptors (Supplemental 1.4). Overall, female presenting faces are on screen 28.7% of the time, while male-presenting faces are on screen 60.2% of the time, a 0.48 to 1 ratio (Figure 3). These percentages are similar across channels, and have slowly increased for both groups (similar to how the percentage of time any face is on screen has increased). The ratio of female- to male-presenting screen time has increased from 0.41 to 0.54 over the decade (Figure 1b). While the upward trend indicates movement towards gender parity, the rate of change is slow, and these results also reinforce prior observations on the under-representation of women in both film (geenadavis) and news media (gmmp).

We acknowledge that our simplification of presented gender to a binary quantity fails to represent transgender or gender nonconforming individuals (keyes2018; hamidi2018). Further, an individual’s presented gender may differ from their actual gender identification. Despite these simplifications, we believe that automatically estimating binary presented gender labels is useful to improving understanding of trends in gender representation in cable TV news media.

Figure 3. The percentage of time male-presenting and female-presenting faces are on screen is similar on all three channels, and has increased over the decade with the rise in all faces noted in Figure 2. Because male- and female-presenting faces can be on screen simultaneously, the lines can add to more than 100%.
Figure 4. Distribution of individuals’ screen time, separated by presenters on each channel and non-presenters (stacked). 65% of individuals with 100+ hours of screen time are news presenters. Note that the three leftmost bars are truncated and the truncated portion includes presenters from all three channels. The leading non-presenters are annotated. See Figure 7 for the top news presenters.
Figure 5. Screen time of U.S. presidential candidates during the campaign and primary season of the 2016 and 2012 elections. (a) Donald Trump received significantly more screen time than the other Republican candidates in 2016. (b) Hillary Clinton and Bernie Sanders, by contrast, received nearly equal screen time during the competitive primary season (January-May 2016). Compared to 2012, (c) Mitt Romney, the Republican nominee, did not decisively overtake the other Republican candidates in screen time until he became the presumptive nominee.
Figure 6. (a) The percentage of time a news presenter is on screen has remained mostly flat on FOX and MSNBC, but has risen by 13% on CNN since 2016. (b-d) Within each channel, the screen time of news presenters by presented-gender (as a percentage of total news presenter screen time) varies across the decade. CNN reaches parity in January-June 2012 and May-August 2015, but has since diverged. Because male- and female-presenting news presenters can be on screen simultaneously, the lines can add to more than 100%.
Figure 7. Screen time of the top five presenters on each channel. Since 2016, several of the top presenters on CNN dramatically risen in screen time. Following O’Reilly’s firing and Kelly’s departure from FOX in 2017, Hannity and Carlson have risen. Since 2013, the variation in screen time among the top five hosts on MSNBC has been low compared to CNN and FOX.

Which public figures receive the most screen time? We estimate the identity of faces detected in our data set using the Amazon Rekognition Celebrity Recognition API (amazonrekognition). For individuals that are not currently included (or not accurately detected) by the API, we train our own classifiers using FaceNet (facenet) descriptors (Supplemental 1.5). We identify 1,260 unique individuals that receive at least 10 hours of screen time in our data set. These individuals account for 47% of the 263M faces that we detect in the news content and are on screen for 45% of screen time. The top individual is Donald Trump, who rises to prominence in the 2016 presidential campaigning season and from 2017 onward during his presidency (Figure 1c). Barack Obama is second, with 0.63 Trump’s screen time, and is prevalent between 2010 (the start of the data set) and 2017 (the end of his second term). Besides U.S. presidents, the list of top individuals is dominated by politicians and news presenters (e.g. anchors, daytime hosts, field reporters, etc.) (Figure 4).

How much screen time do political candidates get before an election? During the 2016 Republican presidential primaries, Donald Trump consistently received more screen time than any other candidate (Figure 5a). In the competitive months of the primary season, from January to May 2016, Trump received 342 hours of screen time, while his closest Republican rival, Ted Cruz, received only 130 hours. In the same timespan, the leading Democratic candidates, Hillary Clinton and Bernie Sanders received more equal screen time (164 hours compared to 139 hours for Clinton); both received far more screen time than the other Democratic primary candidates (Figure 5b). Comparing the two presidential nominees, during the period from January 1, 2016 to election day, Trump received 1.9 more screen time than Clinton.

Unlike Trump in 2016, in the run up to the 2012 presidential election, Mitt Romney (the eventual Republican nominee) did not receive as dominating an amount of screen time (Figure 5c). Other Republican candidates such as Herman Cain, Michelle Bachmann, Newt Gingrich, and Rick Santorum have higher peaks than Romney at varying stages of the primary season, and it is not until April 2012 (when his last rival withdraws) that Romney’s screen time decisively overtakes that of his rivals. For reference, Figure 5d shows the screen time of Barack Obama during the same period. As the incumbent president up for re-election, Obama had no significant primary challenger. Obama received more screen time throughout 2011 than Romney because, as the president, he is in the news for events and policy actions related to his duties as president (e.g., U.S. missile strikes in Libya, job growth plan, etc.); in 2012, however, they are comparable. The overall trends are similar when viewed by channel, with Trump dominating screen time in 2016 on all three channels (Supplemental 2.1.3).

Who presents the news? Cable TV news programs feature hosts, anchors and on-air staff (e.g., contributors, meteorologists) to present the news. We manually marked 325 of the public figures who we identified in our data set as news presenters (107 on CNN, 130 on FOX, and 88 on MSNBC). Overall, we find that a news presenter is on screen 28.1% of the time – 27.4% on CNN, 33.5% on FOX, and 23.0% on MSNBC. On CNN, the percentage of time that a news presenter is on screen increases by 13% between 2015 and 2018, while, on FOX and MSNBC, it remains mostly flat over the decade (Figure 6a).

The news presenters with the most screen time are Anderson Cooper (1,782 hours) on CNN, Bill O’Reilly (1,094 h) on FOX, and Rachel Maddow (1,202 h) on MSNBC. Moreover, while the top presenter on each channel varies a bit over the course of the decade (Figure 7), Cooper and O’Reilly hold the top spot for relatively long stretches on CNN and FOX respectively. Also, while Maddow appears the most on MSNBC overall, Chris Matthews holds the top spot for the early part of the decade (2010 to 2014). However, since 2014, the top presenter on MSNBC has fluctuated on a monthly basis (Figure 7c). The 13% rise in screen time of news presenters on CNN that we saw earlier (Figure 6a) can largely be attributed to three hosts (Anderson Cooper, Chris Cuomo, and Don Lemon) who see 2.5, 4.5, and 5.5 increases in screen time from 2015 onwards (Figure 7a) and account for over a third of all news presenter screen time on CNN in 2019.

How does screen time of male- and female-presenting news presenters compare? The list of top news presenters by screen time is dominated by male-presenting individuals. Of the top five news presenters on each channel, accounting for 31% (CNN), 22% (FOX), and 34% (MSNBC) of news presenter screen time, only one is female on CNN and FOX and two on MSNBC (Figure 7). Across all three channels, there is a shift towards gender parity in screen time of news presenters early in the decade followed by a divergence.

CNN exhibits gender parity for news presenters in January-June 2012 and May-August 2015 (Figure 6b). However, from September 2015 onward, CNN diverges as the 10% increase in the screen time of male-presenting news presenters (from 14% to 24%) outpaces the 3% increase for female presenters (13% to 16%). The increase in male-presenting news presenter screen time on CNN mirrors the increase in overall news presenter screen time on CNN due to an increase in the screen time for Anderson Cooper, Don Lemon, and Chris Cuomo (Figure 7a).

Similarly, the gender disparity of news presenters on FOX decreases from 2010 to 2016, but widens in 2017 due to an increase in the screen time of male-presenting news presenters (Figure 6c). This occurs around the time of, former top hosts, Megyn Kelly’s and Bill O’Reilly departure from FOX (6% and 5% of presenter screen time on FOX in 2016). Their time is replaced by a rise in Tucker Carlson’s and Sean Hannity’s screen time (3% and 5% of news presenter screen time on FOX in 2016 and up to 11% and 7% in 2017 and 2018). The increase in female-presenting news presenter screen time in October 2017 occurs when Laura Ingraham’s Ingraham Angle and Shannon Bream’s FOX News @ Night debut.

On MSNBC, the disparity as percentage of news presenter screen time increases from May 2017 to July 2019. (Figure 6d). This is due to similar drop in the screen time of both male- and female-presenting news presenters. The percentage of time when male-presenting news presenters are on screen falls from 17% to 13%, while the percentage for female-presenting news presenters falls from 14% to 7%. Unlike with CNN and FOX, the decline is more distributed across news presenters; the screen time of the top five presenters from 2017 to 2019 is comparatively flat (Figure 7c).

Which news presenters hog the screen time on their shows? We compute the percentage of time a news presenter is on screen on their own show (“screenhog score”) and plot the top 25 “screenhog”s (Figure 8). Chris Cuomo (CNN) has the highest fraction of screen time on his own show (visible 70.6% of the time on Cuomo Primetime), while Tucker Carlson (FOX) is second at 55.3% on Tucker Carlson Tonight. These results can be attributed to the format of these two shows; Cuomo and Carlson both do interviews and often show their own reactions to guests’ comments. Carlson also regularly monologues while on screen. Compared to both CNN and MSNBC, FOX has the most screenhogs (13 of the top 25) many of whom are well-known hosts of FOX’s opinion shows. The top presenters by channel, Bill O’Reilly, Anderson Cooper, and Rachel Maddow, also break the top 25, with screenhog scores of 28.5%, 28.3%, and 24.2%, respectively.

Figure 8. The 25 news presenters that receive the largest fraction of screen time on their own show (“screenhog”s), and the total amount of video content for their show in the data set. The top two shows by this metric, Cuomo Primetime and Tucker Carlson Tonight, are relatively recent shows, starting in June 2018 and November 2016, respectively.
Figure 9. Blonde female news presenters consistently receive more screen time on FOX than non-blonde female news presenters. CNN catches up to FOX from 2014 onward, while the screen time of blonde female news presenters has risen since 2015 on MSNBC. On MSNBC, blonde female news presenters do not receive more screen time than non-blonde female news presenters. Because blonde and non-blonde female news presenters can be on screen at the same time, the lines in (a) and (b) can add to more than 100%.
Figure 10. The average age of news presenters, weighted by screen time, has increased on all three channels (bold lines). FOX has the highest average age for both male- and female-presenting news presenters.
Figure 11. In early coverage of the shooting of Trayvon Martin by George Zimmerman, all channels used the same photos of Martin and Zimmerman. However, as the story progressed, depictions of Trayvon (top) differed significantly across channels. Depictions of Zimmerman (bottom), also evolved over time, but largely reflect efforts by channels to use the most up-to-date photo of Zimmerman during legal proceedings.

What is the average age of news presenters? We obtain the birth date for each of the 325 news presenters from Wikipedia (wikipedia) and then compute the average age of news presenters on each channel when they are on screen (Supplemental 1.8). From 2010 to 2019, the average age of news presenters rises from 48.2 to 51.0 years (Figure 10). This trend is visible for all three channels, though there are localized reversals, often marked by retirements of older, prominent hosts; for example, the average news presenter’s age on CNN falls slightly after Larry King’s retirement in 2010 at age 76. Across all three channels, female-presenting news presenters are younger than their male-presenting counterparts by 6.3 years. However, the gap has narrowed in recent years.

Are female-presenting news presenters disproportionately blonde? We manually annotated the hair color (blonde, brown, black, other) of 145 female news presenters and computed screen time of these groups (Supplemental 1.9). We find that blonde news presenters account for 64.7% of female-presenting news presenter screen time on FOX (compared to 28.8% for non-blonde news presenters), giving credence to the stereotype that female-presenting news presenters on FOX fit a particular aesthetic which includes blonde hair (advanced, for example, in The Guardian (foxisblonde)). Counter to this stereotype, however, FOX is not alone; the proportion of blonde news presenters on CNN (56.6% overall, 58.2% since 2015, compared to 38.6% overall for non-blondes) has risen and currently, the chance of seeing a blonde female news presenter is approximately equal on the two networks. (Figure 9). The screen time of blonde female presenters is lower on MSNBC (36.6%), while non-blonde female news presenters account for 55.7%. On MSNBC, brown is the dominant hair color (40.8%), but (21.4%) is due to a single brown-haired host (Rachel Maddow). On all three channels, the percentage of blonde female news presenters far exceeds the natural rate of blondness in the U.S. (% according to the Bureau of Labor Statistics (nlsy79)).

3. How are individuals portrayed?

Editorial decisions about the images and graphics to include with stories can subtly influence the way viewers understand a story. We examine such editorial choices in the context of the Trayvon Martin shooting.

Which photos of Trayvon Martin and George Zimmerman appeared most often on each channel? On February 26, 2012, Trayvon Martin, a 17 year-old high-school student, was fatally shot by neighborhood watchman George Zimmerman (trayvonshootingfacts). Media depictions of both Martin and Zimmerman were scrutinized heavily as the story captured national interest (trayvonphotos; foxtrayvon). We identified unique photographs of Martin and Zimmerman in our data set using a K-NN classifier on FaceNet descriptors (facenet) and tabulated the screen time of these photos (see Supplemental 1.10).

Figure 11 shows the four photos of Martin (top row) and Zimmerman (bottom row) that received the most screen time in the aftermath of the shooting and during Zimmerman’s 2013 trial. In the initial week of coverage, all three channels used same image of Martin (purple). This image generated significant discussion about the “baby-faced” depiction of Martin, although it was dated to a few months before the shooting. In the ensuing weeks (and later during Zimmerman’s trial), differences in how the three channels depict Marin emerge. CNN most commonly used a photograph of Martin smiling in a blue hat (blue box). In contrast, the most commonly shown photo on FOX depicts an unsmiling Martin (orange). MSNBC most frequently used the black-and-white image of Martin in a hoodie (pink) that was the symbol for protests in support of Trayvon and his family. The three different images reflect significant differences in editorial decisions made by the three channels.

Depictions of Zimmerman also evolved with coverage of the shooting, and reflect both efforts by channels to use the most up-to-date photos for the story at hand, but also the presence of editorial choices. All three channels initially aired the same image of Zimmerman (purple). The photo, depicting Zimmerman in an orange polo shirt, was both out of date and taken from a prior police incident unrelated to the Martin shooting. A more recent photograph of Zimmerman (pink) was made available to news outlets in late March 2012. While FOX and CNN transitioned using this new photo, which depicts a smiling Zimmerman, a majority of the time, MSNBC continued to give more screen time to the original photo. After mid-April 2012, depictions of Zimmerman on all three channels primarily show him in courtroom appearances as the legal proceeding unfolded.

4. What is discussed in the news?

The amount of coverage that topics receive in the news can influence viewer perceptions of world events and newsworthy stories. As a measure of the frequency of which key topics are discussed, we count the number of times selected words appear in video captions.

Figure 12.

Some countries receive more attention in U.S. cable TV news in aggregate than others. Russia is the largest outlier followed by Iran.

Figure 13. Major peaks in mentions of foreign countries occur around disasters and crises. Since the start of Trump’s presidency, there has been an increase in coverage of Russia, China, and North Korea due to increased tensions and a marked shift in U.S. foreign policy (shaded).

How often are foreign countries mentioned? Foreign country names, defined in Supplemental 1.11, appear in the captions a total of 4.5M times. Most countries receive little coverage (Figure 12), and the eight countries with the highest number of mentions (Russia, Iran, Syria, Iraq, China, North Korea, Israel, and Afghanistan), account for 51% of all country mentions. Russia alone accounts for 11.2%. (If treated as a country, ISIS would rank 2nd after Russia at 8.4%.) Of these eight, five have been in a state of armed conflict in the last decade, while the other three have had major diplomatic rifts with the U.S. These data suggest that military conflict and tense U.S. relations beget coverage. No countries from South America and Southeast Asia appear in the top eight; the top countries from these regions are Venezuela (32th) and Vietnam (25th). Mexico, which frequently appears due to disputes over immigration and trade, is 9th, while Canada is 21st.

Mentions of countries often peak due to important events. Figure 13 annotates these events for the 15 most often mentioned countries. For example, the Libyan Civil War in 2011 and the escalation of the Syrian Civil War in 2012-2013 and the rise of ISIS (Syria, Iraq) in 2014 correspond to peaks. The countries ranked 11 to 15 are otherwise rarely in the news, but the 2011 tsunami and Fukushima Daiichi nuclear disaster; the 2014 annexation of Crimea by Russia; and the Charlie Hebdo shooting and November Paris attacks (both in 2015), elevated Japan, Ukraine, and France to brief prominence.

Following the election of Donald Trump in 2016, there has a been a marked shift in the top countries, corresponding to topics such as Russian election interference, North Korean disarmament talks, the Iran nuclear deal, and the trade war with China.

Figure 14. Following a major terrorist attack, mass shooting, or plane crash, usage of related terms increases and remains elevated for 2-3 weeks before returning to pre-event levels. A few plane crashes continued to be covered after this period as new details about the crash or disappearance (in the case of MH370) emerge. In the figure above, lines for individual events are terminated early if another, unrelated, event of the same category occurs; for example, the San Bernardino shooting (a terrorist attack) in December 2015 occurred three weeks after the November 2015, Paris attacks.

For how long do channels cover acts of terrorism, mass shootings, and plane crashes? We enumerated 18 major terrorist attacks (7 in the U.S. and 11 in Europe), 18 mass shootings, and 25 commercial airline crashes in the last decade, and we counted related N-grams such as terror(ism,ist), shoot(ing,er), and plane crash in the weeks following these events (Supplemental 1.12 gives the full lists of terms). Counts for terrorism and shootings return to the pre-event average after about two weeks (Figure 14a,b,c). Likewise, coverage of plane crashes also decline to pre-crash levels within two weeks (Figure 14d), though there are some notable outliers. Malaysia Airlines Flight 370, which disappeared over the Indian Ocean in 2014, remained in the news for nine weeks, and Malaysia Airlines Flight 17, shot down over eastern Ukraine, also received coverage for four weeks as more details emerged, leading to subsequent peak in coverage.

Is it illegal or undocumented immigration? “Illegal immigrant” and “undocumented immigrant” are competing terms that describe individuals who are in the U.S. illegally, with the latter term seen as more politically correct (illegalvsundoc). Figure 15 shows the counts of when variants of these terms are said (Supplemental 1.13 gives the full list of variants). Illegal is used on FOX the most (59K times); FOX also has more mentions of immigration overall. From 2012 onward, undocumented has increased in use on CNN and MSNBC, though illegal still appears equally or more often on these channels than undocumented.

Figure 15. Counts of “illegal immigrant” and “undocumented immigrant” terminology in video captions, by month. Illegal is more common than undocumented on all three channels, but FOX uses it the most. Undocumented only comes into significant use from 2012 onward.

How often are honorifics used to refer to President Trump and Obama? Honorifics convey respect for a person or office. We compared the number of times that President (Donald) Trump is used compared to other mentions of Trump’s person (e.g., Donald Trump, just Trump). When computing the number of mentions of just Trump, we exclude references to nouns such as the Trump administration and Melania Trump that contain the word Trump, but are not referring Donald Trump (Supplemental 1.14 gives the full list of exclusions).

The term President Trump only emerges on all three channels following his inauguration to the office in January 2017 (Figure 16a-c). President is used nearly half of the time on CNN and FOX after his inauguration. By contrast, MSNBC continues to most commonly refer to him as Trump, without using the honorific term President. We plot similar charts of President Obama over the course of his presidency from 2010 to January 2017 (Figure 16d-e) and find that, on all three channels, the honorific term President is used more often than not. Also, we find that Trump, in general, is mentioned approximately 3 more than Obama on a monthly basis during the periods of their respective presidencies in our data set. This data suggests that although coverage of the incumbent president has increased since the start of Trump’s presidency in 2017, the level of formality when referring to the president has fallen.

Figure 16. Counts of Trump and Obama peak in election years (2016 and 2012). After his inauguration, Trump is referred to more often without President than with (MSNBC has the largest gap). By contrast, Obama is referred to with President more often than not. The channel color-coded lines represent the total counts of Trump and Obama, without exclusions such as the Trump administration, etc.. Note that most of these counts are captured by the N-grams that we identified as references to Trump and Obama’s persons.

5. Who is on screen when a word is said?

People are often associated with specific topics discussed in cable TV news. We analyze the visual association of faces to specific topics by computing how often faces are on screen at the same time that specific words are mentioned. We obtain millisecond-scale time alignments of caption words with the video’s audio track using the Gentle word aligner (gentlealigner) (Supplemental 1.1).

Which words are most likely to be said when women are on screen?

By treating both face detections and words as time intervals, we compute the conditional probability of observing at least one female-presenting (or one male-presenting) face on screen given each word in the caption text (Supplemental 1.15). This conditional probability can be viewed analogously to TF-IDF weighting 

(manningnlp), where the term-frequency is the number of co-occurrences of the word and the individual’s face, and the document-frequency is the total number of times a word is said. Because of the gender imbalance in screen time, the conditional probability of a female-presenting face being on screen when any word is said is 29.6%, compared to 61.4% for male-presenting faces, so we are interested in words where the difference between female and male probabilities deviates from the baseline 31.9% difference.

Figure 17 shows the top 35 words most associated with male- and female-presenting faces on screen. For female-presenting faces, the words are about womens’ health (e.g., breast, pregnant); family (e.g., boyfriend, husband, mom(s), mothers, parenthood, etc.); and female job titles (e.g., actress, congresswoman). Weather-related terms (e.g., temperatures, meteorologist, blizzard, tornadoes) and business news terms (e.g., futures, Nasdaq, stocks, earnings) are also at or near gender parity; we attribute this to a number of prominent female weatherpersons (Indra Petersons/CNN, Janice Dean/FOX, Maria Molina/FOX) and female business correspondents (Christine Romans/CNN, Alison Kosik/CNN, JJ Ramberg/MSNBC, Stephanie Ruhle/MSNBC, Maria Bartiromo/FOX) across much of the last decade. By contrast, the top words associated with male-presenting faces on screen are about foreign affairs, terrorism, and conflict (e.g., ISIL, Israelis, Iranians, Saudis, Russians, destroy, treaty); and with fiscal policy (e.g., deficits, trillion, entitlement(s)). The stark difference in the words associated with female-presenting screen time suggests that, over the last decade, the subjects discussed on-air by presenters and guests varied strongly depending on their gender.

Figure 17. The distribution of words by difference in conditional probability of a female- versus a male-presenting face being on screen (Supplemental 1.15). The 35 words that are most associated with male- and female-presenting screen time are annotated. Note the stark differences in topic representation in the top male and female associated words: foreign policy, conflict, and fiscal terms (male); and female health, family, weather, and business news terms (female).

Who uses unique words? We define vocabulary to be “unique” to a person if the probability of that individual being on screen conditioned on the word being said (at the same time) is high. Table 1 lists all words for which an individual has a greater than a 50% chance of being on screen when the word is said. (We limit analysis to words mentioned at least 100 times.) Political opinion show hosts (on FOX and MSNBC) take the most creative liberty in their words, accounting for all but three names in the list.

Person Unique words ()
Bill O’Reilly (FOX) opine (60.6), reportage (59.0), spout (58.6),
urchins (57.9), pinhead[ed,s] (49.0, 51.5, 50.2)
Ed Schultz (MSNBC) classers (71.2), beckster (61.6),
drugster (59.9), righties (55.2),
trenders (60.8), psychotalk (54.2)
Tucker Carlson (FOX) pomposity (76.2), smugness (71.5),
groupthink (70.5)
Sean Hannity (FOX) abusively (76.1), Obamamania (53.3)
Glenn Beck (FOX) Bernays (82.3), Weimar (62.2)
Rachel Maddow (MSNBC) [bull]pucky (47.9, 50.7), debunktion (51.4)
Chris Matthews (MSNBC) rushbo (50.5)
Kevin McCarthy (politician) untrustable (75.9)
Chris Coons (politician) Delawareans (63.8)
Hillary Clinton (politician) generalistic (56.5)
Table 1. Unique words are often euphemisms or insults (urchins children, beckster Glenn Beck, drugster/rushbo Rush Limbaugh, righties conservatives, etc.). Others are the names of show segments or slogans. For example, Psychotalk is a segment of the Ed Show; Sean Hannity refers to the liberal media as Obamamania media; and Tucker Carlson brands his own show as the “sworn enemy” of lying, pomposity, smugness, and groupthink. Some rare words become unique due to being replayed often on the news; for example, Kevin McCarthy (U.S. representative) calls Hillary Clinton untrustable and Hillary Clinton uses generalistic in the same sentence as her infamous phrase branding Trump’s supporters as a “basket of deplorables”.
Figure 18. The percentage of time president is said with Trump increases for hosts after Trump’s inauguration. Chris Cuomo (CNN) drops from over 40% to under 20% in June 2018 with his transition from hosting New Day to Cuomo Primetime. Sean Hannity’s (FOX) decline is more gradual over the course of Trump’s presidency. From 2017 onward, Wolf Blitzer (CNN) is consistently above the other top hosts on any of the three channels (averaging 72%).
Figure 19. Percentage of mentions that use the president honorific for Trump (post-inauguration to January 20, 2017) and Obama (before January 20, 2017) by each news presenter (dots). A majority of presenters on all three channels use president a higher fraction of time when mentioning Obama than they do with Trump. The presenters with the highest screen time on each channel are annotated.
Figure 20. Hillary Clinton is on screen up to 33% of the time when email(s) is mentioned (11% on average from 2015 to 2016). This is significantly higher than the percentage of time that Clinton is on screen when any word is said (1.9% on average in the same time period).

Which presenters are on screen when the President honorific is said? A news presenter’s use of the President honorific preceding Trump or Obama might set a show’s tone for how these leaders are portrayed. When a presenter is on screen, we find that the honorific term President is used a greater percentage of time for Obama than for Trump, during the period of their presidencies (Figure 19). On all three channels, most presenters lie below the parity line. However, the average FOX presenter is closer to parity between uses of the term President to refer to Trump and Obama (a few FOX presenters lie above the line) than the average presenter on CNN and MSNBC. Figure 18 shows how the top hosts (by screen time) on each channel are associated with uses of President to refer to Trump over time.

How much was Hillary Clinton’s face associated with the word email? Hillary Clinton’s emails were a frequent news topic in 2015 and during the 2016 presidential election due to investigations of the 2012 Benghazi attack and her controversial use of a private email server while U.S. Secretary of State. During this period, Clinton’s face was often on screen when these controversies were discussed, visually linking her to the controversy. We compute that during the period spanning 2015 to 2016, Hillary Clinton’s face is on screen during 11% of mentions of the word email(s) (Figure 20), a significantly higher percentage than the 1.9% of the time that she is on screen overall. This degree of association is similar across all three channels (Supplemental 2.3.1).

6. Interactive Visualization Tool

Figure 21. Our interactive visualization tool supports time-series analysis of the cable TV news data set. Users define queries using a combination of face, caption text, and video metadata filters. The tool generates time-series plots of the total amount of video (aggregate screen time) matching these queries (left). To provide more context for the segments of video included in the chart, users can click on the chart to bring up the videos matching the query (right). We have found that providing direct access to the videos is often essential for debugging queries and better understanding the relevant video clips.

We have developed an interactive, web-based visualization tool (available at https://tvnews.stanford.edu) that enables the general public to perform analyses of the cable TV news data set (Figure 21). Our design, inspired by the Google N-gram Viewer (ngrams), generates time-series line charts of the amount of cable TV news video (aggregate time) matching user-specified queries. Queries may consist of one or more filters which select intervals of time when a specific individual appears on screen (name="..."), an on screen face has a specific presented gender (tag="male"), a keyword or phrase appears in the video captions (text="..."), or the videos come from a particular channel (channel="CNN"), program, or time of day. Clicking on the graph allows users to view the videos matching the query.

To construct more complex analyses, the tool supports queries containing conjunctions and disjunctions of filters, which serve to intersect or union the video time intervals matched by individual filters (name="Hillary Clinton" AND text="email" AND channel="FOX"). We implemented a custom in-memory query processing system to execute screen time aggregation queries over the entire cable TV news data set while maintaining interactive response times for the user. In addition to generating time-series plots of video time, the tool enables users to directly view video clips (and their associated captions) that match queries.

A major challenge when developing this tool was making an easy-to-use, broadly accessible data analysis interface, while still exposing sufficient functionality to support a wide range of analyses of who and what appears on cable TV news. We call out three design decisions made during tool development.

(1) Limit visualization to time-series plots. Time-series analysis is a powerful way to discover and observe patterns over the decade spanned by the cable TV news data set. While time-series analysis does not encompass the full breadth of analyses presented in this paper, we chose to focus the visualization tool’s design on the creation of time-series plots to encourage and simplify this important form of analysis.

(2) Use screen time as a metric. We constrain all queries, regardless of whether visual filters or caption text filters are used, to generate counts of a single metric: the amount of screen time matching the query. While alternative metrics, such as using word counts to analyze of caption text (section 4) or counts of distinct individuals to understand who appears on a show, may be preferred for certain analyses, we chose screen time because it is well suited to many analyses focused on understanding representation in the news. For example, a count of a face’s screen time directly reflects the chance a viewer will see a face when turning on cable TV news. Also, word counts can be converted into screen time intervals by attributing each instance of a word, regardless of its actual temporal extent, to a fixed interval of time (textwindow="..."). As a result, our tool can be used to effectively perform comparisons of word counts as well.

Our decision to make all filters select temporal extents simplified the query system interface. All filters result in a selection of time intervals, allowing all filters to be arbitrarily composed in queries that combine information from face identity labels and captions. A system where some filters yielded word counts and others yields time intervals would complicate the user experience as it introduces the notion of different data types into queries.

(3) Facilitate inspection of source video clips. We found it important for the visualization tool to support user inspection of the source video clips that match a query (Figure 21-right). Video clip inspection allows a user to observe the context in which a face or word appears in a video. This context in turn is helpful for understanding why a clip was included in a query result, which facilitates deeper understanding of trends being investigated, aids the process of debugging and refining queries, and helps a user assess the accuracy of the automatically generated video labels relied on by a query.

7. Limitations and Discussion

Annotating video using machine learning techniques enables analysis at scale, but it also presents challenges due to the limitations of automated methods. Most importantly, the labels generated by computational models have errors, and understanding the prevalence and nature of labeling errors (including forms of bias) is important to building trust in analysis results. Labeling errors also have the potential to harm individuals that appear in cable TV news, in particular when related to gender or race (excavatingai; imagenetbias; buolamwini:2018:gendershades). As a step toward understanding the accuracy of labels, we validated the output of our face and commercial detection, presented gender estimation, and person identification models (for a small subset of individuals) against human-provided labels on a small collection of frames. The details of this validation process, and the measured accuracy of models, are provided in supplemental material.

Despite errors in our computational labeling methods at the individual level, aggregate data about gender representation over time on cable TV news is useful for understanding gender disparities. Many questions about representation in cable TV news media similarly concern the subject of race, but we are unaware of any computational model that can accurately estimate an individual’s race from their appearance (models we have seen have much lower accuracy than models for estimating presented gender). However, it may be possible to automatically determine the race of individuals for whom we have an identity label by using external data sources to obtain the individual’s self-reported race. A similar procedure could also be used to obtain the self-identified gender of an individual, reducing our reliance on estimating presented gender from their appearance. Such approaches could further improve our understanding of race and gender in cable TV news.

While our query system can determine when a specific individual’s face is on screen when a word is spoken, it does not perform automatic speaker identification. As a result, the on screen face may not be speaking, e.g., when a news presenter delivers narration over silent B-roll footage. Extending our system to perform automatic speaker identification (Ephrat:2018:LookToListen) would allow it to directly support questions about the speaking time of individuals in news programs or about which individuals spoke about what stories.

Our system lacks mechanisms for automatically differentiating different formats of face appearances. For example, an individual’s face may be on screen because they are included in infographics, directly appearing on the program (as a contributor or guest), or shown in B-roll footage. The ability to differentiate these cases would enable new analyses of how the news covers individuals.

Finally, we believe adding the ability to identify duplicate clips in the data set would prove to be useful in future analyses. For example, duplicate clips can signal re-airing of programs or replaying of popular sound bites. We would also like to connect analyses with additional data sources such as political candidate polling statistics (fivethirtyeight), as well as the number and demographics of viewers (nielsen). Joining in this data would enable analysis of how cable TV news impacts politics and viewers more generally. We are working with several news organizations to deploy private versions of our tool on their internal video archives.

8. Related Work

Manual analysis of news and media. There have been many efforts to study trends in media presentation, ranging from analysis of video editing choices (hallin1992; barnhurst1997; bucy2007; quickerdarker), coverage of political candidates (taiwanbias), prevalence of segment formats (e.g. interviews (interviewfrequency)), and representation by race and gender (bbc5050; gmmp; WMC:2017:womeninmedia; MediaMatters:2016:diversity). These efforts rely on manual annotation of media, which limits analysis to small amounts of video (e.g., a few 100’s of shows (hallin1992; bucy2007), five Sunday morning news shows (MediaMatters:2016:diversity)) or even to anecdotal observations of a single journalist (foxtrayvon; trumpfreemedia). The high cost of manual annotation makes studies at scale rare. For example, the BBC 50:50 Project (bbc5050), which audits gender representation in news media depends on self-reporting from newsrooms across the world. GMMP (gmmp) relies on a global network of hundreds of volunteers to compile a report on gender representation every five years. While automated techniques cannot generate the same variety of labels as human annotators (GMMP requires a volunteer to fill out a three-page form for stories they annotate (gmmp)), we believe annotation at scale using computational techniques stands to complement these manual efforts.

Automated analysis of media. Our work was heavily inspired by the Google N-gram viewer (ngrams) and Google Trends (GOOGtrends), which demonstrate that automated computational analysis of word frequency, when performed at scale (to centuries of digitized books, or the world’s internet search queries) can serve as a valuable tool for studying trends in culture. These projects allow the general public to conduct analyses by creating simple time series visualizations of word frequencies. We view our work as bringing these ideas to cable TV news video.

Our system is similar to the GDELT AI Television Explorer (gdelt_ai_explorer), which provides a web-based query interface for caption text and on screen chryon text in the Internet Archive’s cable TV news data set, and recently added support for queries for objects appearing on screen. Our work analyzes nearly the same corpus of source video, but unlike GDELT we label the video with information about the faces on screen. We believe information about who is on screen is particularly important in many analyses of cable TV news media, such as those in this paper.

In general, there is growing interest in using automated computational analysis of text, images, and videos to facilitate understanding of trends in media and the world. This includes mining print news and social media to predict civil unrest (embers; embers4y) and forced population migration (forcedmigration), using facial recognition on TV video streams to build connectivity graphs between politicians (japanfaces), using gender classification to quantify the lack of female representation in Hollywood films (geenadavis), understanding presentation style and motion in “TED talk” videos (huaminvideo; huaminemoco), identifying trends in fashion (Matzen:2017:StreetStyle; Ginosar:2017:yearbook) from internet images, or highlighting visual attributes of cities (Doersch:2012:Paris; arietta2014city). These serve as interesting examples of the types of future analyses that could be performed on our cable TV news dataset.

Time series visualizations of word and document frequencies are commonly used to show changes in patterns of cultural production (epoch), and we take inspiration from advocates of “distant reading,” who make use of these visual representations to allow for insights that are impossible from manual inspection of document collections (moretti).

Alternative approaches for video analysis queries. A wide variety of systems exist for interactive video analysis, and existing work in interaction design has presented other potential approaches to formulating queries over video data sets. Video Lens (videolens) demonstrates interactive filtering using brushing and linking to filter complex spatio-temporal events in baseball video. The query-by-example approach (querybyexample) has been used in image (cao2010mindfinder; ebaysearch; pinterestsearch; conceptcanvas), and sports domains (sha2016chalkboarding; sha2017fine). These example-based techniques are less applicable for our visualization tool, which focuses on letting users analyze who and what is in cable TV news; specifying a query by typing a person’s name or the keywords in the caption is often easier for users than specifying these attributes by example.

Other works from Höferlin, et al. (facetedexploration) and Meghdadi, et al. (activeshotsummary) propose interactive methods to cluster and visualize object trajectories to identify rare events of interest in surveillance video. Analyzing motion-based events (e.g., hand gestures) in TV news is an area of future work, but our current analyses target more visually static elements such as faces and their identities.

9. Conclusion

We have conducted a qualitative analysis of nearly a decade of U.S. cable TV news video. We demonstrate that automatically-generated video annotations, such as annotations for when faces are on screen and when words appear in captions, can facilitate analyses that provide unique insight into trends in who and what appears in cable TV news. To make analysis of our data set accessible to the general public, we have created an interactive screen time visualization tool that allows users to describe video selection queries and generate time-series plots of screen time, which ingests new video on a daily basis. We are excited to launch the tool to the general public, and we hope that it encourages further analysis and insight into the presentation of this important form of news media.

Acknowledgements.
This material is based upon work supported by the National Science Foundation (NSF) under IIS-1539069 and III-1908727. This work was also supported by financial and computing gifts from the Brown Institute for Media Innovation, Intel Corporation, Google Cloud, and Amazon Web Services. We thank the Internet Archive for providing their data set for academic use. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsors.

References