Blog #4: Distant Reading, Text Mining, and Topic Modeling

This week, our Digital Tools for Historians class read about distant reading, text mining, and topic modeling. Of course, before this week, I had no idea what any of those phrases meant. First, we learned about “distant reading”, a research method introduced by Franco Moretti, an Italian literary scholar and founder of the Stanford Literary Lab. Moretti’s Literary Lab analyzes and detects hidden aspects in plots of literature (such as Hamlet and King Lear). Distant reading maps the relationships among characters according to how often they interact or speak with one another in the play. That information reveals the identity of the story’s protagonist by pinpointing which character “minimized the sum of the distances to all other vertices.”  This idea blew my mind.



According to an article by Kathryn Schulz called “What is Distant Reading?” distant reading is “understanding literature not by studying particular texts, but by aggregating and analyzing massive amounts of data.” In short, instead of reading novels, distant learning entails mapping, graphing, and diagramming literary works. However, I cannot help but wonder what we are supposed to do with that information. Any Shakespeare fan will already be aware of the identity of the protagonist. However, I am intrigued by distant learning and I wonder how I, as a historian, could utilize it in my research.

We also read an article by William G. Thomas called “Computing and the Historical Imagination”. In the article, Thomas discusses the work of historians such as William Ayers, Robert Darnton, and Roy Rosenzweig who advocate the use of digital tools in historical analysis and research. However, Thomas also touches upon several weaknesses related to the fusion of computation and historical research. Rosenzweig cautions historians about the issue of information overload. Rosenzweig warns, “Historians, in fact, may be facing a fundamental paradigm shift from a culture of scarcity to a culture of abundance”. At the same time, he points out that fleeting nature of electronic records as they are being lost every day.

Williams’ article also discusses interpretive and imaginative digital creations that utilize historical GIS to recreate “lost landscapes”, which allow readers to move and navigate through them. In my mind, I immediately thought of video games such as Assassin’s Creed Unity and the soon to be released Assassin’s Creed Syndicate. Historical action-adventure video games like Unity and Syndicate let the player experience Paris, France during the French Revolution and Victorian London, England. The company that makes the games, Ubisoft, hired historians, architects, cartographers, video game designers, and other experts to historically recreate the two cities. The player learns a significant amount about history while playing the games. Historical video games are interactive, educational, and entertaining. While some may scoff at video games as “edutainment”, I believe historical video games can become powerful tools for historians. I am not sure if historical video games is even something Williams thinks about when he discusses navigating “lost landscapes”, but it is still a way for historians to make learning about history fun and interactive.

A scene from Assassin’s Creed Syndicate. The video game allows the player to explore Victorian London.
Notre-Dame Cathedral in Assassin’s Creed Unity.

Our class was also introduced to the idea of topic modeling, a text mining technique. Scott Weingart wrote an easy to understand article about it called, “Topic Modeling for Humanists: A Guided Tour”. As Weingart explains, topic modeling is “a class of computer programs that automatically extract topics from texts”. I have to admit that the article becomes a little too complicated for me when Weingart attempts to explain LDA (Latent Dirichlet Allocation) and LSA (Latent Semantic Analysis). However, I did grasp that LDA is a distinct kind of topic modeling produced by David Blei in 2002. I suppose I do not have to know everything about topic modeling to understand how it works, and I must say that I am impressed by it.

Weingart uses speeches by President Barack Obama as an example in order to help readers understand topic modeling. For instance, if one wanted to know which topics President Obama discusses the most in his speeches, a topic modeling tool could be used to reveal several lists of words. Each of the lists is a “topic.” Weingart explains that the list might look like this:

1. Job Jobs Loss Unemployment Growth
2. Economy Sector Economics Stock Banks
3. Afghanistan War Troops Middle-East Taliban Terror
4. Election Romney Upcoming President

Topic modeling requires special software packages. It can be used to analyze an eighteenth century diary, to examine old newspapers, and to detect themes in other primary sources. I find that I am very intrigued by topic modeling and I look forward to learning more about it. Robert K. Nelson’s New York Times article, “Of Monsters, Men — and Topic Modeling”, E. Thomas Ewing’s article “Mining Coverage of the Flu: Big Data’s Insights into an Epidemic”, and Dan Cohen and Fred Gibbs’ article, “A Conversation with Data: Prospecting Victorian Words and Ideas” indicate ways in which topic modeling can be used to gain historical understanding of events.

There are some impressive projects out there which utilize topic modeling. For instance, Mining the Dispatch examines 100,000 articles from a Confederate newspaper called the Dispatch in order to uncover themes as well as “broad and subtle patterns in Civil War news that we would otherwise be unable to detect.” As Alexis de Tocqueville once said, “Nothing but a newspaper can drop the same thought into a thousand minds at the same moment.” Analyzing newspapers can give insights into the types of thoughts that were being “dropped” into the minds of newspaper readers.

We were also introduced to several interesting tools:

1. Google Books Ngram Viewer
2. JSTOR Data for Research
3. Chronicle: Mapping New York Times Language Use Over Time
4. TAPoR: Textual Analysis Portal

I really enjoyed using Google Books Ngram! I decided to go to the site and look up “Florida” and “WPA”. There were many results. I tried different words as well, and all of them had results. The tool is sure to be a big help to me as I research my project topic (Florida Folk Music and the WPA). I know it will help me with other projects as well. I also loved the JSTOR Data for Research Tool! I searched the same topics as I did on Google Books Ngram and many results appeared. How did I not know about these amazing tools? I will definitely be able to use the tools for my research pertaining to my digital project and my eventual thesis on Florida folk music.

I did not expect to enjoy this week’s topics as much as I did, and I am excited to learn more about how to use digital tools in my historical research and in my project for the class. I have only been in the Digital Tools for Historians class for a month, but I have already learned so much. I am truly thankful for Dr. French’s knowledge, guidance, and patience. My mind has been opened, and now I see that the future of history does not have to entail sitting in a dusty cave and writing a long-winded book or article about history that may not speak to a large audience. Instead, the future of history can be exciting, interactive, collaborative, interdisciplinary, and even hopeful.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s