Basic text [data] analysis for research.
I recently became interested in learning some text-based data pre-processing (i.e. cleaning up and organizing) and analysis skills using Python coding language. This is very new for me, but I can already envision some times that this may become useful in my research. Perhaps I can utilize natural language processing (NLP) to search for trends in the public policy literature…or to attempt to draw conclusions about how specific themes have evolved over a period of time.
My interest was piqued during a visit to Northeastern’s School of Public Policy and Urban Affairs. While observing a class (w. Professor Daniel Aldrich), I heard a talk by computational sociologist, Professor Laura Nelson.
Without commentary, below I have included the very basic output of my first bit of natural language toolkit (NLTK) text analysis code, which I used to compare word frequency in speeches from the recent Democrat and Republican national conventions (August 2020). The code can be found here via GitHub.
Quick Plots:

Most frequent words used in 2020 presidential nomination acceptance speeches by Joe Biden (top) and Donald Trump (bottom); recall that Joe Biden spoke one week prior.

You must be logged in to post a comment.