U.S. Intellectual History Blog

Mining the Media II: Have We Actually Learned Anything?

Editor's Note

This is the concluding part of a two-part series by Aubrey Parke. Aubrey Parke is an M.A. Public History student and Graduate Assistant at Duquesne University. In the past, she has worked as an oral historian, community archivist, and consulting firm analyst. She is from San Antonio, Texas, where she dedicates her spare time to work surrounding immigration and refugee issues.

If my analysis has only served to illustrate media bias, the tedious process of locating, saving, cleaning, uploading, and analyzing a corpus is not worth the trouble. After all, almost everyone already knows that Fox leans right and CNN leans left. While my experiment with Voyant is only a starting point, I believe that text mining digital media sources on a larger scale can be useful to both researchers and the general public.

Below, I outline four uses for text-mining online news content.

1. Evaluating new media sources

I included Axios in my analysis partially because it claims to be neutral, but also because it is fairly obscure and most people – myself included – do not have preconceptions about its political leaning. Axios’ tweets do relay a nuanced narrative about topics like defunding the police and the 2020 election, one that is often critical of both political parties. More than other news sources, however, Axios emphasizes technology and business news on Twitter. These tweets include events and interviews with CEOs of organizations like Verizon, Bank of America, and Qualcomm. In fact, Verizon and Qualcomm were among Axios’ unique frequent terms. Despite tweeting far less often, Axios tweeted about the economy almost as many times as CNN during the same 1-month period.

Figure 1: Relative frequency of terms “tech,” “technology,” “technologies,” “CEO,” “CEO’s,” “CEOS,” “business,” “businesses,” “digital,” and “digitally.”

Figure 2: Total uses of the word “economy” – 22 for Axio and 26 for CNN.

This focus on technology and the economy, while not inherently political, suggests that Axios appeals to an audience that is deeply invested in business, finance, and innovation. Such an interest may affect how Axios covers other topics, including domestic and international social unrest. Text analysis allows us to read Axios with a more critical eye, going beyond the claims of nonpartisan neutrality.

2. Uncovering the unexpected

My approach to mining the media has many limitations, not least because I started with presuppositions about the biases in my corpus. Despite this methodological flaw, some of my findings still surprised me. For example, I assumed that CNN Twitter would capitalize on race and immigration during the 2020 election. Instead, I found that all five sources talked about racism at about the same frequency, and CNN mostly avoided the topics of immigration, BLM, and police brutality.

Figure 3: Frequency of the words immigrant, immigrants, immigration, immigrate, immigrated, illegal, illegals, illegally, alien, and undocumented.

Figure 4: Frequency of the phrases “racism,” “racist,” and “systemic racism.”

3. Detecting dogwhistles

Many people, including Vice-President elect Kamala Harris, have used the phrase “dogwhistling” this year. Currently, “dogwhistling” often describes how white supremacist groups communicate publicly via seemingly innocuous words and phrases that only other white supremacists will recognize. The term, however, can apply to any use of coded insider language. Text mining may help researchers to identify different groups’ dogwhistles and make this information available to the public. For example, does saying “coronavirus” instead of “COVID” suggest a more laissaz-faire attitude to preventing spread? Does using the phrase “big tech” instead of just “tech companies” suggest a critical or even conspiratorial attitude towards tech corporations? Here, text mining can help us read between the lines.

4. Indexing the future digital archive

If researchers carefully mine the media to document trends in content and language, future historians will be better able to interpret our digital record. Picture this:

It’s 2121. You steer your hoverchair towards the 360-degree hologram station and open Newswebsites.com. (You’ve created a new email account so you can get another free 30-day trial.) Your grandmother told you stories about the 2020 COVID-19 pandemic, stories that sparked your interest and lead you to research it for your dissertation. After all, 2020 is the reason you live on a Mars colony and drink melted ice caps instead of SmartWater.

You want to include diverse voices in your dissertation, but hundreds of thousands of articles clamor for your attention. Who should you listen to?

Suppose you had a map to guide you from source to source, showing you where to focus your research. For a far-right, populist perspective, try Breitbart. If it’s all about the money, look at WSJ. The New York Times went up in flames, but a blend of CNN and The New Yorker will help you understand what many academics and East Coast professionals were reading.

Text-miners can build a LibGuide for the future archive by indexing topic frequency, key terms, and unique phrases for different media outlets. This bias map would facilitate insightful research that captures a wide array of controversial, complex voices.