A Data Application to Foretell the Next Silicon Valley?

Can we predict what the next hub of tech entrepreneurship will be? Could we pinpoint where the next real estate boom will be and invest there? Thanks to advances in machine learning and easier access to public data through Open Data initiatives, we can now explore these types of questions.

Read More »

Data Science Wanderlust: Analyzing Global Health with Protein Sequences

Fifteen years ago, I had the unique opportunity to go on Semester at Sea, an around-the-world trip on a converted cruise ship that combined college coursework stops at nine countries on four continents. This once in a lifetime trip instilled in me a strong sense of wanderlust and a deep desire to give back to the global community.

Every Journey Begins with a Single Step

Fast-forward to a few months ago, when I joined Exaptive on an exciting new project. A large NGO enlisted us to analyze a massive set of historical data for countries. The goal: to develop a better, more granular means of grouping countries than the outdated and crude approach of "developed" and "developing." This large, complex, messy dataset and thorny problem were a great fit for my background in artificial intelligence and data science.

Read More »

Affecting Change Using Social Influence Mapping

If you've ever tried to get a company to adopt new software you know how challenging it can be. Despite what seem to you like obvious benefits and your relentless communication, people selectively ignore or, worse, revolt against the change. Change efforts will even stumble in the face of this wisdom of the ages:

Read More »

Text Analysis with R: Does POTUS Write the State of the Union or Vice Versa?

In this post, I apply text clustering techniques – hierarchical clustering, K-Means, and Principal Components Analysis – to every presidential state of the union address from Truman to Obama. I used R for the setup, the clustering, and the data vis.

It turns out that the state of the union writes the State of the Union more than the president does. The words used in the addresses appear linked to the era more than to an individual president or his party affiliation. However, there is one major exception in President George W. Bush, whose style and content marks a sharp departure from both his predecessors and contemporaries. You can see the R scripts and more technical detail on the process here. The State of the Union addresses up to 2007 are available here and the rest you can get here.

Read More »

Cowboys and Inventors: The Myth of the Lone Genius

I recently moved from Boston to Oklahoma City. My wife got offered a tenure-track position at the University of Oklahoma, which was too good an opportunity for her career for us to pass up. Prior to the move, I had done a lot of traveling in the US, but almost exclusively on the coasts, so I didn't know what living in the southern Midwest would bring, and I was a bit trepidatious. It has turned out to be a fantastic move. There is a thriving high-tech startup culture here. I've been able to hire some great talent out of the University, and we're now planning to build up a big Exaptive home office here. Even more important, I was delighted to find a state that was extremely focused on fostering creativity and innovation. In fact, the World Creativity Forum is being hosted here this week, and I was asked to give a talk about innovation. As I thought about what I wanted to say, I found myself thinking about . . . cowboys.

Read More »

The Data Scientific Method

The Oxford English Dictionary defines the scientific method as "a method or procedure that has characterized natural science since the 17th century, consisting in systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses." With more scientists today than ever, the scientific method is alive and well, and generating more data than ever. This explosion of data has brought about the field of data science and an associated plethora of analytics tools. Controversially, some have claimed, such as in this Wired magazine article, that data science is so powerful that it has made the scientific method obsolete. Google's founding philosophy is that “we don't know why this page is better than that one. If the statistics of incoming links say it is, that's good enough.” The implication is that with enough data, people will no longer need to know why something happens, it just does, and that’s good enough. Is it, really?

Read More »