Cable TV news reaches millions of U.S. households each day, and profoundly influences public opinion and discourse on current events (tvnewsPew). While cable TV news has been on air for over 40 years, there has been little longitudinal analysis of its visual aspects. As a result, we have little understanding of who appears on cable TV news and what these individuals talk about.
Consider questions like, What is the screen time of men vs. women? Which political candidates and news presenters receive the most screen time? How are victims and perpetrators of violence portrayed? Which foreign countries are discussed the most? Who is on screen when different topics are discussed?
In this paper, we demonstrate that it is possible to answer such questions by analyzing a data set comprised of nearly 24/7 coverage of video, audio and text captions from three major U.S. cable TV news channels – CNN, FOX (News) and MSNBC – over the last decade (January 1, 2010 to July 23, 2019). The data set was collected by the Internet Archive’s TV News Archive (tvnewsarchive) and in total includes 244,038 hours (equivalent to about 27.8 years) of footage. Using automated machine learning tools, we label the data set – e.g., we detect faces, label their presented gender, identify prominent public figures, and align text captions to audio. These labels allow us to detect 68,179 hours of commercials (27.9% of the video), leaving 175,858 hours (72.1%) of news programming (Figure 1a). In this paper, we focus on analyzing the news programming part of this data set.
Each of the resulting labels has a temporal extent, and we use these extents to compute the screen time
of faces and identify when faces are on screen and when words are said. We show that, by analyzing the screen time of faces, counting words in captions, and presenting results in the form of time-series plots, we can reveal a variety of insights, patterns, and trends about the data. To this end, we adopt an approach similar to the Google N-gram viewer(ngrams), which demonstrated the usefulness of word frequency analysis of 5.2 million books and print media from 1800 to 2000 to many disciplines, as well as to the GDELT AI Television Explorer (gdelt_ai_explorer) which enables analysis of cable TV news captions and on screen objects (but not people). The goal of our work is to enable similar analyses of cable TV news video using labels that aid understanding of who is on screen and what is in the captions.
Our work makes two main contributions.
We demonstrate that analyzing a decade of cable TV news video generates a variety of insights on a range of socially relevant issues, including gender balance (section 2), visual bias (section 3), topic coverage (section 4) and new presentation (section 5). The details of our complete data processing and labeling pipeline used for these analyses are described in Supplemental 1.
We present an interactive, web-based data analysis interface, akin to the Google N-gram viewer and GDELT AI Television Explorer, that allows users to easily formulate their own their analysis queries on our annotated data set of cable TV news (section 7). Our analysis interface is publicly accessible at https://tvnews.stanford.edu
and it updates daily with new cable TV news video. Our data processing code will be made available as open source.
2. Who is in the news?
People are an integral part of the news stories that are covered, how they are told, and who tells them. We analyze the screen time and demographics of faces in U.S. cable TV news.
How much time is there at least one face on screen? We detect faces using the MTCNN (mtcnn) face detector on frames sampled every three seconds (Supplemental 1.3). Face detections span a wide range of visual contexts ranging from in-studio presenters/guests, people in B-roll footage, or static infographics. Overall, we detect 263M total faces, and at least one face appears on screen 75.3% of the time. The percentage of time with a face on screen has risen steadily from 72.9% in 2010 to 81.5% in 2019, and is similar across all three channels (Figure 2).
We also observe an increase in the average number of faces on screen. On CNN and FOX the amount of time when only one face is on screen has declined, while it has remained constant on MSNBC. On all three channels, the amount of time when multiple faces (2 or more) are on screen simultaneously has risen. This accounts for the overall increase in time when at least one face is on screen, though we do not analyze which types of content (with no faces on screen) that this footage is replacing. We note that while the average number of faces has increased in news content, the average number of faces on screen in commercials has remained flat since 2013 (Supplemental 2.1.1).
How does screen time of male-presenting individuals compare to female-presenting individuals?facenet) descriptors (Supplemental 1.4). Overall, female presenting faces are on screen 28.7% of the time, while male-presenting faces are on screen 60.2% of the time, a 0.48 to 1 ratio (Figure 3). These percentages are similar across channels, and have slowly increased for both groups (similar to how the percentage of time any face is on screen has increased). The ratio of female- to male-presenting screen time has increased from 0.41 to 0.54 over the decade (Figure 1b). While the upward trend indicates movement towards gender parity, the rate of change is slow, and these results also reinforce prior observations on the under-representation of women in both film (geenadavis) and news media (gmmp).
We acknowledge that our simplification of presented gender to a binary quantity fails to represent transgender or gender nonconforming individuals (keyes2018; hamidi2018). Further, an individual’s presented gender may differ from their actual gender identification. Despite these simplifications, we believe that automatically estimating binary presented gender labels is useful to improving understanding of trends in gender representation in cable TV news media.
Which public figures receive the most screen time? We estimate the identity of faces detected in our data set using the Amazon Rekognition Celebrity Recognition API (amazonrekognition). For individuals that are not currently included (or not accurately detected) by the API, we train our own classifiers using FaceNet (facenet) descriptors (Supplemental 1.5). We identify 1,260 unique individuals that receive at least 10 hours of screen time in our data set. These individuals account for 47% of the 263M faces that we detect in the news content and are on screen for 45% of screen time. The top individual is Donald Trump, who rises to prominence in the 2016 presidential campaigning season and from 2017 onward during his presidency (Figure 1c). Barack Obama is second, with 0.63 Trump’s screen time, and is prevalent between 2010 (the start of the data set) and 2017 (the end of his second term). Besides U.S. presidents, the list of top individuals is dominated by politicians and news presenters (e.g. anchors, daytime hosts, field reporters, etc.) (Figure 4).
How much screen time do political candidates get before an election? During the 2016 Republican presidential primaries, Donald Trump consistently received more screen time than any other candidate (Figure 5a). In the competitive months of the primary season, from January to May 2016, Trump received 342 hours of screen time, while his closest Republican rival, Ted Cruz, received only 130 hours. In the same timespan, the leading Democratic candidates, Hillary Clinton and Bernie Sanders received more equal screen time (164 hours compared to 139 hours for Clinton); both received far more screen time than the other Democratic primary candidates (Figure 5b). Comparing the two presidential nominees, during the period from January 1, 2016 to election day, Trump received 1.9 more screen time than Clinton.
Unlike Trump in 2016, in the run up to the 2012 presidential election, Mitt Romney (the eventual Republican nominee) did not receive as dominating an amount of screen time (Figure 5c). Other Republican candidates such as Herman Cain, Michelle Bachmann, Newt Gingrich, and Rick Santorum have higher peaks than Romney at varying stages of the primary season, and it is not until April 2012 (when his last rival withdraws) that Romney’s screen time decisively overtakes that of his rivals. For reference, Figure 5d shows the screen time of Barack Obama during the same period. As the incumbent president up for re-election, Obama had no significant primary challenger. Obama received more screen time throughout 2011 than Romney because, as the president, he is in the news for events and policy actions related to his duties as president (e.g., U.S. missile strikes in Libya, job growth plan, etc.); in 2012, however, they are comparable. The overall trends are similar when viewed by channel, with Trump dominating screen time in 2016 on all three channels (Supplemental 2.1.3).
Who presents the news? Cable TV news programs feature hosts, anchors and on-air staff (e.g., contributors, meteorologists) to present the news. We manually marked 325 of the public figures who we identified in our data set as news presenters (107 on CNN, 130 on FOX, and 88 on MSNBC). Overall, we find that a news presenter is on screen 28.1% of the time – 27.4% on CNN, 33.5% on FOX, and 23.0% on MSNBC. On CNN, the percentage of time that a news presenter is on screen increases by 13% between 2015 and 2018, while, on FOX and MSNBC, it remains mostly flat over the decade (Figure 6a).
The news presenters with the most screen time are Anderson Cooper (1,782 hours) on CNN, Bill O’Reilly (1,094 h) on FOX, and Rachel Maddow (1,202 h) on MSNBC. Moreover, while the top presenter on each channel varies a bit over the course of the decade (Figure 7), Cooper and O’Reilly hold the top spot for relatively long stretches on CNN and FOX respectively. Also, while Maddow appears the most on MSNBC overall, Chris Matthews holds the top spot for the early part of the decade (2010 to 2014). However, since 2014, the top presenter on MSNBC has fluctuated on a monthly basis (Figure 7c). The 13% rise in screen time of news presenters on CNN that we saw earlier (Figure 6a) can largely be attributed to three hosts (Anderson Cooper, Chris Cuomo, and Don Lemon) who see 2.5, 4.5, and 5.5 increases in screen time from 2015 onwards (Figure 7a) and account for over a third of all news presenter screen time on CNN in 2019.
How does screen time of male- and female-presenting news presenters compare? The list of top news presenters by screen time is dominated by male-presenting individuals. Of the top five news presenters on each channel, accounting for 31% (CNN), 22% (FOX), and 34% (MSNBC) of news presenter screen time, only one is female on CNN and FOX and two on MSNBC (Figure 7). Across all three channels, there is a shift towards gender parity in screen time of news presenters early in the decade followed by a divergence.
CNN exhibits gender parity for news presenters in January-June 2012 and May-August 2015 (Figure 6b). However, from September 2015 onward, CNN diverges as the 10% increase in the screen time of male-presenting news presenters (from 14% to 24%) outpaces the 3% increase for female presenters (13% to 16%). The increase in male-presenting news presenter screen time on CNN mirrors the increase in overall news presenter screen time on CNN due to an increase in the screen time for Anderson Cooper, Don Lemon, and Chris Cuomo (Figure 7a).
Similarly, the gender disparity of news presenters on FOX decreases from 2010 to 2016, but widens in 2017 due to an increase in the screen time of male-presenting news presenters (Figure 6c). This occurs around the time of, former top hosts, Megyn Kelly’s and Bill O’Reilly departure from FOX (6% and 5% of presenter screen time on FOX in 2016). Their time is replaced by a rise in Tucker Carlson’s and Sean Hannity’s screen time (3% and 5% of news presenter screen time on FOX in 2016 and up to 11% and 7% in 2017 and 2018). The increase in female-presenting news presenter screen time in October 2017 occurs when Laura Ingraham’s Ingraham Angle and Shannon Bream’s FOX News @ Night debut.
On MSNBC, the disparity as percentage of news presenter screen time increases from May 2017 to July 2019. (Figure 6d). This is due to similar drop in the screen time of both male- and female-presenting news presenters. The percentage of time when male-presenting news presenters are on screen falls from 17% to 13%, while the percentage for female-presenting news presenters falls from 14% to 7%. Unlike with CNN and FOX, the decline is more distributed across news presenters; the screen time of the top five presenters from 2017 to 2019 is comparatively flat (Figure 7c).
Which news presenters hog the screen time on their shows? We compute the percentage of time a news presenter is on screen on their own show (“screenhog score”) and plot the top 25 “screenhog”s (Figure 8). Chris Cuomo (CNN) has the highest fraction of screen time on his own show (visible 70.6% of the time on Cuomo Primetime), while Tucker Carlson (FOX) is second at 55.3% on Tucker Carlson Tonight. These results can be attributed to the format of these two shows; Cuomo and Carlson both do interviews and often show their own reactions to guests’ comments. Carlson also regularly monologues while on screen. Compared to both CNN and MSNBC, FOX has the most screenhogs (13 of the top 25) many of whom are well-known hosts of FOX’s opinion shows. The top presenters by channel, Bill O’Reilly, Anderson Cooper, and Rachel Maddow, also break the top 25, with screenhog scores of 28.5%, 28.3%, and 24.2%, respectively.
What is the average age of news presenters? We obtain the birth date for each of the 325 news presenters from Wikipedia (wikipedia) and then compute the average age of news presenters on each channel when they are on screen (Supplemental 1.8). From 2010 to 2019, the average age of news presenters rises from 48.2 to 51.0 years (Figure 10). This trend is visible for all three channels, though there are localized reversals, often marked by retirements of older, prominent hosts; for example, the average news presenter’s age on CNN falls slightly after Larry King’s retirement in 2010 at age 76. Across all three channels, female-presenting news presenters are younger than their male-presenting counterparts by 6.3 years. However, the gap has narrowed in recent years.
Are female-presenting news presenters disproportionately blonde? We manually annotated the hair color (blonde, brown, black, other) of 145 female news presenters and computed screen time of these groups (Supplemental 1.9). We find that blonde news presenters account for 64.7% of female-presenting news presenter screen time on FOX (compared to 28.8% for non-blonde news presenters), giving credence to the stereotype that female-presenting news presenters on FOX fit a particular aesthetic which includes blonde hair (advanced, for example, in The Guardian (foxisblonde)). Counter to this stereotype, however, FOX is not alone; the proportion of blonde news presenters on CNN (56.6% overall, 58.2% since 2015, compared to 38.6% overall for non-blondes) has risen and currently, the chance of seeing a blonde female news presenter is approximately equal on the two networks. (Figure 9). The screen time of blonde female presenters is lower on MSNBC (36.6%), while non-blonde female news presenters account for 55.7%. On MSNBC, brown is the dominant hair color (40.8%), but (21.4%) is due to a single brown-haired host (Rachel Maddow). On all three channels, the percentage of blonde female news presenters far exceeds the natural rate of blondness in the U.S. (% according to the Bureau of Labor Statistics (nlsy79)).
3. How are individuals portrayed?
Editorial decisions about the images and graphics to include with stories can subtly influence the way viewers understand a story. We examine such editorial choices in the context of the Trayvon Martin shooting.
Which photos of Trayvon Martin and George Zimmerman appeared most often on each channel? On February 26, 2012, Trayvon Martin, a 17 year-old high-school student, was fatally shot by neighborhood watchman George Zimmerman (trayvonshootingfacts). Media depictions of both Martin and Zimmerman were scrutinized heavily as the story captured national interest (trayvonphotos; foxtrayvon). We identified unique photographs of Martin and Zimmerman in our data set using a K-NN classifier on FaceNet descriptors (facenet) and tabulated the screen time of these photos (see Supplemental 1.10).
Figure 11 shows the four photos of Martin (top row) and Zimmerman (bottom row) that received the most screen time in the aftermath of the shooting and during Zimmerman’s 2013 trial. In the initial week of coverage, all three channels used same image of Martin (purple). This image generated significant discussion about the “baby-faced” depiction of Martin, although it was dated to a few months before the shooting. In the ensuing weeks (and later during Zimmerman’s trial), differences in how the three channels depict Marin emerge. CNN most commonly used a photograph of Martin smiling in a blue hat (blue box). In contrast, the most commonly shown photo on FOX depicts an unsmiling Martin (orange). MSNBC most frequently used the black-and-white image of Martin in a hoodie (pink) that was the symbol for protests in support of Trayvon and his family. The three different images reflect significant differences in editorial decisions made by the three channels.
Depictions of Zimmerman also evolved with coverage of the shooting, and reflect both efforts by channels to use the most up-to-date photos for the story at hand, but also the presence of editorial choices. All three channels initially aired the same image of Zimmerman (purple). The photo, depicting Zimmerman in an orange polo shirt, was both out of date and taken from a prior police incident unrelated to the Martin shooting. A more recent photograph of Zimmerman (pink) was made available to news outlets in late March 2012. While FOX and CNN transitioned using this new photo, which depicts a smiling Zimmerman, a majority of the time, MSNBC continued to give more screen time to the original photo. After mid-April 2012, depictions of Zimmerman on all three channels primarily show him in courtroom appearances as the legal proceeding unfolded.
4. What is discussed in the news?
The amount of coverage that topics receive in the news can influence viewer perceptions of world events and newsworthy stories. As a measure of the frequency of which key topics are discussed, we count the number of times selected words appear in video captions.
How often are foreign countries mentioned? Foreign country names, defined in Supplemental 1.11, appear in the captions a total of 4.5M times. Most countries receive little coverage (Figure 12), and the eight countries with the highest number of mentions (Russia, Iran, Syria, Iraq, China, North Korea, Israel, and Afghanistan), account for 51% of all country mentions. Russia alone accounts for 11.2%. (If treated as a country, ISIS would rank 2nd after Russia at 8.4%.) Of these eight, five have been in a state of armed conflict in the last decade, while the other three have had major diplomatic rifts with the U.S. These data suggest that military conflict and tense U.S. relations beget coverage. No countries from South America and Southeast Asia appear in the top eight; the top countries from these regions are Venezuela (32th) and Vietnam (25th). Mexico, which frequently appears due to disputes over immigration and trade, is 9th, while Canada is 21st.
Mentions of countries often peak due to important events. Figure 13 annotates these events for the 15 most often mentioned countries. For example, the Libyan Civil War in 2011 and the escalation of the Syrian Civil War in 2012-2013 and the rise of ISIS (Syria, Iraq) in 2014 correspond to peaks. The countries ranked 11 to 15 are otherwise rarely in the news, but the 2011 tsunami and Fukushima Daiichi nuclear disaster; the 2014 annexation of Crimea by Russia; and the Charlie Hebdo shooting and November Paris attacks (both in 2015), elevated Japan, Ukraine, and France to brief prominence.
Following the election of Donald Trump in 2016, there has a been a marked shift in the top countries, corresponding to topics such as Russian election interference, North Korean disarmament talks, the Iran nuclear deal, and the trade war with China.
For how long do channels cover acts of terrorism, mass shootings, and plane crashes? We enumerated 18 major terrorist attacks (7 in the U.S. and 11 in Europe), 18 mass shootings, and 25 commercial airline crashes in the last decade, and we counted related N-grams such as terror(ism,ist), shoot(ing,er), and plane crash in the weeks following these events (Supplemental 1.12 gives the full lists of terms). Counts for terrorism and shootings return to the pre-event average after about two weeks (Figure 14a,b,c). Likewise, coverage of plane crashes also decline to pre-crash levels within two weeks (Figure 14d), though there are some notable outliers. Malaysia Airlines Flight 370, which disappeared over the Indian Ocean in 2014, remained in the news for nine weeks, and Malaysia Airlines Flight 17, shot down over eastern Ukraine, also received coverage for four weeks as more details emerged, leading to subsequent peak in coverage.
Is it illegal or undocumented immigration? “Illegal immigrant” and “undocumented immigrant” are competing terms that describe individuals who are in the U.S. illegally, with the latter term seen as more politically correct (illegalvsundoc). Figure 15 shows the counts of when variants of these terms are said (Supplemental 1.13 gives the full list of variants). Illegal is used on FOX the most (59K times); FOX also has more mentions of immigration overall. From 2012 onward, undocumented has increased in use on CNN and MSNBC, though illegal still appears equally or more often on these channels than undocumented.
How often are honorifics used to refer to President Trump and Obama? Honorifics convey respect for a person or office. We compared the number of times that President (Donald) Trump is used compared to other mentions of Trump’s person (e.g., Donald Trump, just Trump). When computing the number of mentions of just Trump, we exclude references to nouns such as the Trump administration and Melania Trump that contain the word Trump, but are not referring Donald Trump (Supplemental 1.14 gives the full list of exclusions).
The term President Trump only emerges on all three channels following his inauguration to the office in January 2017 (Figure 16a-c). President is used nearly half of the time on CNN and FOX after his inauguration. By contrast, MSNBC continues to most commonly refer to him as Trump, without using the honorific term President. We plot similar charts of President Obama over the course of his presidency from 2010 to January 2017 (Figure 16d-e) and find that, on all three channels, the honorific term President is used more often than not. Also, we find that Trump, in general, is mentioned approximately 3 more than Obama on a monthly basis during the periods of their respective presidencies in our data set. This data suggests that although coverage of the incumbent president has increased since the start of Trump’s presidency in 2017, the level of formality when referring to the president has fallen.
5. Who is on screen when a word is said?
People are often associated with specific topics discussed in cable TV news. We analyze the visual association of faces to specific topics by computing how often faces are on screen at the same time that specific words are mentioned. We obtain millisecond-scale time alignments of caption words with the video’s audio track using the Gentle word aligner (gentlealigner) (Supplemental 1.1).
Which words are most likely to be said when women are on screen?
By treating both face detections and words as time intervals, we compute the conditional probability of observing at least one female-presenting (or one male-presenting) face on screen given each word in the caption text (Supplemental 1.15). This conditional probability can be viewed analogously to TF-IDF weighting(manningnlp), where the term-frequency is the number of co-occurrences of the word and the individual’s face, and the document-frequency is the total number of times a word is said. Because of the gender imbalance in screen time, the conditional probability of a female-presenting face being on screen when any word is said is 29.6%, compared to 61.4% for male-presenting faces, so we are interested in words where the difference between female and male probabilities deviates from the baseline 31.9% difference.
Figure 17 shows the top 35 words most associated with male- and female-presenting faces on screen. For female-presenting faces, the words are about womens’ health (e.g., breast, pregnant); family (e.g., boyfriend, husband, mom(s), mothers, parenthood, etc.); and female job titles (e.g., actress, congresswoman). Weather-related terms (e.g., temperatures, meteorologist, blizzard, tornadoes) and business news terms (e.g., futures, Nasdaq, stocks, earnings) are also at or near gender parity; we attribute this to a number of prominent female weatherpersons (Indra Petersons/CNN, Janice Dean/FOX, Maria Molina/FOX) and female business correspondents (Christine Romans/CNN, Alison Kosik/CNN, JJ Ramberg/MSNBC, Stephanie Ruhle/MSNBC, Maria Bartiromo/FOX) across much of the last decade. By contrast, the top words associated with male-presenting faces on screen are about foreign affairs, terrorism, and conflict (e.g., ISIL, Israelis, Iranians, Saudis, Russians, destroy, treaty); and with fiscal policy (e.g., deficits, trillion, entitlement(s)). The stark difference in the words associated with female-presenting screen time suggests that, over the last decade, the subjects discussed on-air by presenters and guests varied strongly depending on their gender.
Who uses unique words? We define vocabulary to be “unique” to a person if the probability of that individual being on screen conditioned on the word being said (at the same time) is high. Table 1 lists all words for which an individual has a greater than a 50% chance of being on screen when the word is said. (We limit analysis to words mentioned at least 100 times.) Political opinion show hosts (on FOX and MSNBC) take the most creative liberty in their words, accounting for all but three names in the list.
|Person||Unique words ()|
|Bill O’Reilly (FOX)||opine (60.6), reportage (59.0), spout (58.6),|
|urchins (57.9), pinhead[ed,s] (49.0, 51.5, 50.2)|
|Ed Schultz (MSNBC)||classers (71.2), beckster (61.6),|
|drugster (59.9), righties (55.2),|
|trenders (60.8), psychotalk (54.2)|
|Tucker Carlson (FOX)||pomposity (76.2), smugness (71.5),|
|Sean Hannity (FOX)||abusively (76.1), Obamamania (53.3)|
|Glenn Beck (FOX)||Bernays (82.3), Weimar (62.2)|
|Rachel Maddow (MSNBC)||[bull]pucky (47.9, 50.7), debunktion (51.4)|
|Chris Matthews (MSNBC)||rushbo (50.5)|
|Kevin McCarthy (politician)||untrustable (75.9)|
|Chris Coons (politician)||Delawareans (63.8)|
|Hillary Clinton (politician)||generalistic (56.5)|
Which presenters are on screen when the President honorific is said? A news presenter’s use of the President honorific preceding Trump or Obama might set a show’s tone for how these leaders are portrayed. When a presenter is on screen, we find that the honorific term President is used a greater percentage of time for Obama than for Trump, during the period of their presidencies (Figure 19). On all three channels, most presenters lie below the parity line. However, the average FOX presenter is closer to parity between uses of the term President to refer to Trump and Obama (a few FOX presenters lie above the line) than the average presenter on CNN and MSNBC. Figure 18 shows how the top hosts (by screen time) on each channel are associated with uses of President to refer to Trump over time.
How much was Hillary Clinton’s face associated with the word email? Hillary Clinton’s emails were a frequent news topic in 2015 and during the 2016 presidential election due to investigations of the 2012 Benghazi attack and her controversial use of a private email server while U.S. Secretary of State. During this period, Clinton’s face was often on screen when these controversies were discussed, visually linking her to the controversy. We compute that during the period spanning 2015 to 2016, Hillary Clinton’s face is on screen during 11% of mentions of the word email(s) (Figure 20), a significantly higher percentage than the 1.9% of the time that she is on screen overall. This degree of association is similar across all three channels (Supplemental 2.3.1).
6. Interactive Visualization Tool
We have developed an interactive, web-based visualization tool (available at https://tvnews.stanford.edu) that enables the general public to perform analyses of the cable TV news data set (Figure 21). Our design, inspired by the Google N-gram Viewer (ngrams), generates time-series line charts of the amount of cable TV news video (aggregate time) matching user-specified queries. Queries may consist of one or more filters which select intervals of time when a specific individual appears on screen (name="..."), an on screen face has a specific presented gender (tag="male"), a keyword or phrase appears in the video captions (text="..."), or the videos come from a particular channel (channel="CNN"), program, or time of day. Clicking on the graph allows users to view the videos matching the query.
To construct more complex analyses, the tool supports queries containing conjunctions and disjunctions of filters, which serve to intersect or union the video time intervals matched by individual filters (name="Hillary Clinton" AND text="email" AND channel="FOX"). We implemented a custom in-memory query processing system to execute screen time aggregation queries over the entire cable TV news data set while maintaining interactive response times for the user. In addition to generating time-series plots of video time, the tool enables users to directly view video clips (and their associated captions) that match queries.
A major challenge when developing this tool was making an easy-to-use, broadly accessible data analysis interface, while still exposing sufficient functionality to support a wide range of analyses of who and what appears on cable TV news. We call out three design decisions made during tool development.
(1) Limit visualization to time-series plots. Time-series analysis is a powerful way to discover and observe patterns over the decade spanned by the cable TV news data set. While time-series analysis does not encompass the full breadth of analyses presented in this paper, we chose to focus the visualization tool’s design on the creation of time-series plots to encourage and simplify this important form of analysis.
(2) Use screen time as a metric. We constrain all queries, regardless of whether visual filters or caption text filters are used, to generate counts of a single metric: the amount of screen time matching the query. While alternative metrics, such as using word counts to analyze of caption text (section 4) or counts of distinct individuals to understand who appears on a show, may be preferred for certain analyses, we chose screen time because it is well suited to many analyses focused on understanding representation in the news. For example, a count of a face’s screen time directly reflects the chance a viewer will see a face when turning on cable TV news. Also, word counts can be converted into screen time intervals by attributing each instance of a word, regardless of its actual temporal extent, to a fixed interval of time (textwindow="..."). As a result, our tool can be used to effectively perform comparisons of word counts as well.
Our decision to make all filters select temporal extents simplified the query system interface. All filters result in a selection of time intervals, allowing all filters to be arbitrarily composed in queries that combine information from face identity labels and captions. A system where some filters yielded word counts and others yields time intervals would complicate the user experience as it introduces the notion of different data types into queries.
(3) Facilitate inspection of source video clips. We found it important for the visualization tool to support user inspection of the source video clips that match a query (Figure 21-right). Video clip inspection allows a user to observe the context in which a face or word appears in a video. This context in turn is helpful for understanding why a clip was included in a query result, which facilitates deeper understanding of trends being investigated, aids the process of debugging and refining queries, and helps a user assess the accuracy of the automatically generated video labels relied on by a query.
7. Limitations and Discussion
Annotating video using machine learning techniques enables analysis at scale, but it also presents challenges due to the limitations of automated methods. Most importantly, the labels generated by computational models have errors, and understanding the prevalence and nature of labeling errors (including forms of bias) is important to building trust in analysis results. Labeling errors also have the potential to harm individuals that appear in cable TV news, in particular when related to gender or race (excavatingai; imagenetbias; buolamwini:2018:gendershades). As a step toward understanding the accuracy of labels, we validated the output of our face and commercial detection, presented gender estimation, and person identification models (for a small subset of individuals) against human-provided labels on a small collection of frames. The details of this validation process, and the measured accuracy of models, are provided in supplemental material.
Despite errors in our computational labeling methods at the individual level, aggregate data about gender representation over time on cable TV news is useful for understanding gender disparities. Many questions about representation in cable TV news media similarly concern the subject of race, but we are unaware of any computational model that can accurately estimate an individual’s race from their appearance (models we have seen have much lower accuracy than models for estimating presented gender). However, it may be possible to automatically determine the race of individuals for whom we have an identity label by using external data sources to obtain the individual’s self-reported race. A similar procedure could also be used to obtain the self-identified gender of an individual, reducing our reliance on estimating presented gender from their appearance. Such approaches could further improve our understanding of race and gender in cable TV news.
While our query system can determine when a specific individual’s face is on screen when a word is spoken, it does not perform automatic speaker identification. As a result, the on screen face may not be speaking, e.g., when a news presenter delivers narration over silent B-roll footage. Extending our system to perform automatic speaker identification (Ephrat:2018:LookToListen) would allow it to directly support questions about the speaking time of individuals in news programs or about which individuals spoke about what stories.
Our system lacks mechanisms for automatically differentiating different formats of face appearances. For example, an individual’s face may be on screen because they are included in infographics, directly appearing on the program (as a contributor or guest), or shown in B-roll footage. The ability to differentiate these cases would enable new analyses of how the news covers individuals.
Finally, we believe adding the ability to identify duplicate clips in the data set would prove to be useful in future analyses. For example, duplicate clips can signal re-airing of programs or replaying of popular sound bites. We would also like to connect analyses with additional data sources such as political candidate polling statistics (fivethirtyeight), as well as the number and demographics of viewers (nielsen). Joining in this data would enable analysis of how cable TV news impacts politics and viewers more generally. We are working with several news organizations to deploy private versions of our tool on their internal video archives.
8. Related Work
Manual analysis of news and media. There have been many efforts to study trends in media presentation, ranging from analysis of video editing choices (hallin1992; barnhurst1997; bucy2007; quickerdarker), coverage of political candidates (taiwanbias), prevalence of segment formats (e.g. interviews (interviewfrequency)), and representation by race and gender (bbc5050; gmmp; WMC:2017:womeninmedia; MediaMatters:2016:diversity). These efforts rely on manual annotation of media, which limits analysis to small amounts of video (e.g., a few 100’s of shows (hallin1992; bucy2007), five Sunday morning news shows (MediaMatters:2016:diversity)) or even to anecdotal observations of a single journalist (foxtrayvon; trumpfreemedia). The high cost of manual annotation makes studies at scale rare. For example, the BBC 50:50 Project (bbc5050), which audits gender representation in news media depends on self-reporting from newsrooms across the world. GMMP (gmmp) relies on a global network of hundreds of volunteers to compile a report on gender representation every five years. While automated techniques cannot generate the same variety of labels as human annotators (GMMP requires a volunteer to fill out a three-page form for stories they annotate (gmmp)), we believe annotation at scale using computational techniques stands to complement these manual efforts.
Automated analysis of media. Our work was heavily inspired by the Google N-gram viewer (ngrams) and Google Trends (GOOGtrends), which demonstrate that automated computational analysis of word frequency, when performed at scale (to centuries of digitized books, or the world’s internet search queries) can serve as a valuable tool for studying trends in culture. These projects allow the general public to conduct analyses by creating simple time series visualizations of word frequencies. We view our work as bringing these ideas to cable TV news video.
Our system is similar to the GDELT AI Television Explorer (gdelt_ai_explorer), which provides a web-based query interface for caption text and on screen chryon text in the Internet Archive’s cable TV news data set, and recently added support for queries for objects appearing on screen. Our work analyzes nearly the same corpus of source video, but unlike GDELT we label the video with information about the faces on screen. We believe information about who is on screen is particularly important in many analyses of cable TV news media, such as those in this paper.
In general, there is growing interest in using automated computational analysis of text, images, and videos to facilitate understanding of trends in media and the world. This includes mining print news and social media to predict civil unrest (embers; embers4y) and forced population migration (forcedmigration), using facial recognition on TV video streams to build connectivity graphs between politicians (japanfaces), using gender classification to quantify the lack of female representation in Hollywood films (geenadavis), understanding presentation style and motion in “TED talk” videos (huaminvideo; huaminemoco), identifying trends in fashion (Matzen:2017:StreetStyle; Ginosar:2017:yearbook) from internet images, or highlighting visual attributes of cities (Doersch:2012:Paris; arietta2014city). These serve as interesting examples of the types of future analyses that could be performed on our cable TV news dataset.
Time series visualizations of word and document frequencies are commonly used to show changes in patterns of cultural production (epoch), and we take inspiration from advocates of “distant reading,” who make use of these visual representations to allow for insights that are impossible from manual inspection of document collections (moretti).
Alternative approaches for video analysis queries. A wide variety of systems exist for interactive video analysis, and existing work in interaction design has presented other potential approaches to formulating queries over video data sets. Video Lens (videolens) demonstrates interactive filtering using brushing and linking to filter complex spatio-temporal events in baseball video. The query-by-example approach (querybyexample) has been used in image (cao2010mindfinder; ebaysearch; pinterestsearch; conceptcanvas), and sports domains (sha2016chalkboarding; sha2017fine). These example-based techniques are less applicable for our visualization tool, which focuses on letting users analyze who and what is in cable TV news; specifying a query by typing a person’s name or the keywords in the caption is often easier for users than specifying these attributes by example.
Other works from Höferlin, et al. (facetedexploration) and Meghdadi, et al. (activeshotsummary) propose interactive methods to cluster and visualize object trajectories to identify rare events of interest in surveillance video. Analyzing motion-based events (e.g., hand gestures) in TV news is an area of future work, but our current analyses target more visually static elements such as faces and their identities.
We have conducted a qualitative analysis of nearly a decade of U.S. cable TV news video. We demonstrate that automatically-generated video annotations, such as annotations for when faces are on screen and when words appear in captions, can facilitate analyses that provide unique insight into trends in who and what appears in cable TV news. To make analysis of our data set accessible to the general public, we have created an interactive screen time visualization tool that allows users to describe video selection queries and generate time-series plots of screen time, which ingests new video on a daily basis. We are excited to launch the tool to the general public, and we hope that it encourages further analysis and insight into the presentation of this important form of news media.