In this post, I apply text clustering techniques – hierarchical clustering, K-Means, and Principal Components Analysis – to every presidential state of the union address from Truman to Obama. I used R for the setup, the clustering, and the data vis.
It turns out that the state of the union writes the State of the Union more than the president does. The words used in the addresses appear linked to the era more than to an individual president or his party affiliation. However, there is one major exception in President George W. Bush, whose style and content marks a sharp departure from both his predecessors and contemporaries. You can see the R scripts and more technical detail on the process here. The State of the Union addresses up to 2007 are available here and the rest you can get here.
Read More »