Design a site like this with WordPress.com
Get started

Basic text [data] analysis for research.

I recently became interested in learning some text-based data pre-processing (i.e. cleaning up and organizing) and analysis skills using Python coding language. This is very new for me, but I can already envision some times that this may become useful in my research. Perhaps I can utilize natural language processing (NLP) to search for trends in the public policy literature…or to attempt to draw conclusions about how specific themes have evolved over a period of time.

My interest was piqued during a visit to Northeastern’s School of Public Policy and Urban Affairs. While observing a class (w. Professor Daniel Aldrich), I heard a talk by computational sociologist, Professor Laura Nelson.

Without commentary, below I have included the very basic output of my first bit of natural language toolkit (NLTK) text analysis code, which I used to compare word frequency in speeches from the recent Democrat and Republican national conventions (August 2020). The code can be found here via GitHub.

Quick Plots:





Most frequent words used in 2020 presidential nomination acceptance speeches by Joe Biden (top) and Donald Trump (bottom); recall that Joe Biden spoke one week prior.
Advertisement